Optimal Income Taxation in the Presence of Tax Evasion: Expected Utility Versus Prospect Theory

The predictions of expected utility theory (EUT) applied to tax evasion are flawed on two counts: (i) They are quantitatively in error by huge orders of magnitude. (ii) Higher taxation is predicted to lower evasion, which is at variance with the evidence. An emerging literature in behavioral economics, most notably based on prospect theory (PT), has shown that behavioral economics is much better at explaining tax evasion. We extend this literature to incorporate issues of optimal taxation. As a benchmark for a successful theory, we require that it should explain, jointly, the facts on the tax rate, tax gap and the level of government expenditure. We find that when taxpayers use EUT (respectively, PT) and the optimal tax is derived from a social welfare function that also uses EUT (respectively, PT), then, the calibration results are completely at odds with the facts. However, when taxpayers use PT but the social welfare function uses standard EUT, there is a very close match between the predictions and the facts. This has important implications for context dependent preferences but also for the newly emerging literature on liberalism versus paternalism in behavioral economics.


Introduction
Issues of tax evasion are extremely important for all countries. Losses to society from tax evasion are huge. For the USA, for example, based on the most recent data, the tax gap 1 is of the order of $300 billion per year (Slemrod, 2007). 2 An important feature of the existing analysis of tax evasion is that it has largely been carried out in an expected utility theory (EUT) framework.
Recent research points to several serious problems in using an EUT approach to tax evasion. Dhami and al-Nowaihi (2007) apply Kahneman and Tversky's (1979) prospect theory 3 (PT) to the tax evasion decision facing a taxpayer. They show that while EUT gives the correct qualitative results for the e¤ects of the probability of detection and the penalty rate, there are several problems. First, EUT makes the prediction that under reasonable attitudes to risk, namely, non-increasing absolute risk aversion, the taxpayer evades less as the tax rate goes up. The implication is that tax evasion will be at a minimum when the tax rate is 100 percent. This result, due to Yitzhaki (1974), is contradicted by the bulk of empirical evidence. Second, at existing penalty rates and detection probabilities, the quantitative predictions of EUT on the extent of tax evasion are wrong by a factor of about 100. On the other hand, PT gives the correct quantitative and qualitative results. 4 1 Given the magnitudes involved, the misleading welfare consequences of applying EUT to an analysis of tax evasion are potentially very large. Dhami and al-Nowaihi (2007), however, treat the tax rate as exogenous. In this paper, we extend the analysis of Dhami and al-Nowaihi (2007) by asking what should the optimal income tax be, when taxpayers use PT to make their tax evasion decision? This is a substantially more di¢ cult question because the appropriate welfare criteria under behavioral economics is, as yet, an unsettled area.

A brief description of the model
Our framework of analysis is as follows. We consider a model where the government levies taxes to …nance public provision of goods and services. Individuals can choose to evade a fraction of their income. The government audits a fraction of the tax returns. If a taxpayer is caught evading, he pays back owed tax plus a penalty. Individuals gain utility from both private and public consumption. The government chooses the optimal tax rate, given society's preference between private and public expenditure, and taking the subsequent tax evasion behavior of taxpayers into account. In this simple framework, we assess the relative success of EUT and PT. We …nd that PT far outperforms EUT, and also highlights the importance of recognizing that preferences are context dependent.

Brief literature review
The literature on endogenous evasion and optimal taxation is fairly limited and, without exception, uses EUT. There are three main strands of the literature. In the …rst strand, associated with Cremer and Gahvari (1993) and Boadway, Marchand and Pestieau (1994), the problem is to …nd the optimal taxes set by a benevolent planner in the presence of tax avoidance, rather than tax evasion. So, by incurring a cost, the taxpayer can ensure that evasion is never discovered by the tax authorities. This in itself is an interesting problem and allows one to make a case for commodity taxation. Because labour income can be avoided while commodity income cannot be, hence, it is e¢ cient to have some commodity taxes 5 . However, this leaves open the relation between tax evasion and optimal taxation.
The second strand is exempli…ed in the work of Cremer and Gahvari (1996), Marhuenda and Ortuno-Ortin (1997) and Chandar and Wilde (1998). Here the approach is to choose, simultaneously, the optimal tax and the enforcement structure in order to induce truthful experimental evidence show that obligatory advance tax payments reduce tax evasion, a fact that can be explained by PT but not by EUT; see El¤ers and Hessing (1997) and Yaniv (1999). 5 In the absence of tax evasion it is known, from the results of Stiglitz (1972, 1976), that commodity taxes are redundant if income taxes are available. However, if tax bases are measured with error, then a case can be made for commodity taxes even in the absence of tax evasion; see Dhami and al-Nowaihi (2006). reporting of income. In equilibrium, there is no tax evasion. However, if eliminating evasion completely is not possible, then the question of designing an optimal tax structure remains unanswered.
The third strand is given by the recent work of Richter and Boadway (2006) to which our work is most closely connected. They explicitly model the tax evasion decision in the standard Allingham and Sandmo (1972) model. They consider a representative household, hence, the focus is entirely on e¢ ciency issues. The central question, as in the …rst strand, is the optimal mix of income and consumption taxes when the ease of evading di¤erent types of taxes is di¤erent. The welfare objective of the government is to maximize its total revenues (arising from income taxes, consumption taxes and penalties on those caught evading) subject to a taxpayer participation constraint. The main disadvantage of the income tax is that, by inviting tax evasion, it exposes the risk-averse taxpayer to income risk. 6 While the consumption tax cannot be evaded, its main disadvantage is that it distorts relative prices of goods. Hence, optimal taxation arises from a tradeo¤ between imposing income risk on the taxpayer and creating tax distortions.
In the context of our paper, the main criticisms of the existing literature are as follows. First, the behavioral approach to tax evasion shows that an EUT based analysis of tax evasion is seriously misleading. However, the existing literature is based entirely on the EUT framework. Second, tax evasion is a very real phenomenon in both developing and developed countries. The mechanism design approach to evasion, which essentially searches for optimal tax/penalty schemes to completely eliminate evasion, does not seem suitable. Third, the cost of risk to the taxpayer from the risky activity of tax evasion needs to be explicitly modelled. Fourth, one needs to explore welfare criteria other than revenue maximization in designing optimal tax schemes.

Welfare analysis under behavioral economics
Under uncertainty, when decision makers have expected utility preferences, the Pareto frontier can be found, subject to the standard regularity conditions, by a benevolent utilitarian planner who maximizes a weighted sum of the individuals'expected utilities.
Suppose now that individuals have prospect theory preferences (which we discuss in detail in section 4). Under these preferences, decision makers (1) overweight small probabilities, (2) evaluate gains and losses relative to a reference point, (3) are loss-averse and (4) have distinct risk preferences depending on whether they are in the domain of gains or losses, relative to the reference point. 7 Should a social planner respect the PT preferences of an individual? Or should the planner disregard these preferences and evaluate the well being of society on the basis of expected utility? These questions take us into relatively un-chartered territory in economics. There is no clear consensus on the approach to be taken, but we illustrate some well known views on this matter, below. Tversky and Kahneman (1986, abstract) state that "no theory of choice can be both normatively adequate and descriptively correct". Their argument is as follows. Since invariance and dominance are always obeyed when their application is transparent, they should be essential features of any normative theory. However, because of bounded rationality, they are often violated when application is not transparent. A descriptively adequate theory must take account of this. The modern literature on merit goods has stressed the bounded rationality of decision makers. Should policy makers respect boundedly rational decisions or try to alter them? This is the theme of recent work by Camerer et. al. (2003). These authors advocate the case for asymmetric paternalism. Essentially this is an attempt to steer boundedly rational people in the direction of avoiding costly mistakes but, at the same time, distorting the decisions of rational people as little as possible. In the words of the authors: "And a variety of researchers have shown that people exhibit systematic mis-predictions about the costs and bene…ts of choices-for example, the degree of loss aversion exhibited in people's choices seems inconsistent with their actual experiences of gains and losses. It is such errors-apparent violations of rationality-that can justify the need for paternalistic policies to help people make better decisions and come closer to behaving in their own best interest." 8 Several other forms of paternalism are advocated in the literature. Benjamin and Laibson (2003) introduce the concept of benign paternalism. The idea here is to encourage the individual to undertake socially desirable actions, without violating individual liberty to make the decision. The idea is made operational by introducing small hurdles in the way of harmful individuals choices in order to shepherd them in a better direction. 9 Jolls et al (1998) coin the term anti-antipaternalism which rejects the idea of pure libertarianism. O'Donoghue and Rabin (2003) advocate the idea of optimal paternalism which advocates taking account of all costs and bene…ts of paternalism. They reject the view example, Kahneman and Tversky (2000, section 9). Thus all utilities in this paper are decision utilities. 8 Camerer et. al. (2003) illustrate the usefulness of their approach for boundedly rational individuals in several contexts such as the following. Setting default options which encourage savings behavior and protect insurance rights, framing of contracts to include seemingly irrelevant information, disclosure issues, etc. 9 For instance, gamblers are asked to choose up-front a level of liquidity which cannot be subsequently exceeded when they might be in a tempted state. Gamblers without subsequent self control problems will not need to be disciplined so they have no problems with setting liquidity limits, while gamblers with subsequent self control problems will clearly bene…t from these up-front limits. that interventions should be minimal. In their words: "In some instances, even seemingly large deviations from the policy that is optimal for fully rational economic agents would not cause severe harm to those agents. In such cases, even a small probability of people making errors can have dramatic e¤ects for optimal policy."

Liberalism, paternalism and context dependence of preferences
In section 1.3 we have illustrated the increasing appeal of paternalism (as opposed to liberalism) in conducting welfare analysis when individuals have bounded rationality. However, non-paternalism 10 is a fundamental principle of liberalism: individuals are the best judges of their own welfare. It is also a fundamental assumption of social choice theory, welfare economics and mechanism design. 11 In its weakest and most general form, non-paternalism is expressed by the Bergson-Samuelson social welfare function: where x is a social state, I is the number of individuals in society and u i (x) is the utility of individual i as seen by that individual. A special case of (1.1) is that of constant elasticity: where the government exhibits a degree of paternalism in that (1) it may give di¤erent weights to di¤erent individuals and (2) it may exhibit a degree of inequality aversion (captured by ) di¤erent from that exhibited by an individual (as captured by u i (x)). Further specialization of (1.2) gives the two forms we shall use in this paper 12 : Form (1.4) may be viewed as the most liberal of the above, as the government gives equal weight to all individuals and does not modify any of the individual utility functions. A number of axiomatizations lead to (1.4). 13 10 Also known as welfarism or individualism. 11 See, for example, chapters 21, 22 and 23 in Mas-Colell, Whinston and Green (1995). 12 We prefer the term representative agent to dictator. First, individual i may be di¤erent from the decision maker. Second, even if individual i is the decision maker, he/she might be selected by a democratic process, say an election. 13 For example, Harsanyi (1953Harsanyi ( , 1955Harsanyi ( , 1975  We shall argue that the evidence on tax evasion is best explained by the following combination. We use PT to model the tax evasion decision and derive the government budget constraint. The government then chooses the tax rate that maximizes social welfare. Although the government recognizes that the tax evasion decision is best described by PT, when it chooses the tax rate it assumes that society's preference over private and public provision, the u i (x) in (1.3) or (1.4), is given by standard utility theory. We could defend this methodology in either of two ways. The …rst is more traditional while the latter relies on emerging empirical evidence. Our results do not hinge on which interpretation we adopt. Readers of di¤erent persuasions will have a preference for one or the other.
1. The government, when choosing the tax rate, behaves paternalistically, i.e., the government assumes that it has better knowledge of the true welfare of individuals than the individuals themselves.
2. Paternalism can have negative connotations (see our second quote at the beginning of the paper). Hence, our own preference is for an alternative view that draws inspiration from a very large body of empirical evidence that has been generated in behavioral economics. In particular, the evidence suggests, very clearly, that individuals do not have a complete preference ordering over all states and that preferences are heavily context dependent. Alternative contexts can arise, for instance, from the framing of choices. 14 Context dependent preferences do not go away once professionals are presented with choices that they must make on a regular basis. 15 The mental accounting literature pioneered by Richard Thaler is also suggestive of context dependent preferences. 16 Individuals, when making a private consumption decision might act so as to maximize their sel…sh interest. But in a separate role (context), say, as part of the government, as a school governor or as a voter, could act so as to maximize some notion of public well being. Many examples can be given.
1. Individuals might send their own children to private schools (self interest) but could at the same time vote for more funding to government run schools in local or national with unit comparability, non-paternalism, the strong Pareto principle and anonymity (or symmetry). Maskin (1978) derived (1.4) from cardinal measurability, non-paternalism, the strong Pareto principle, anonymity, separability of indi¤erent households, continuity and strong equity. Other axiomatizations lead to other special forms of (1.1). For a review see, for examples, chapters 5 and 6 of Boadway and Bruce (1984). 14 The following is just one example out of hundreds described in Kahneman and Tversky (2000). It is problems 9 and 10 from Quattrone and Tversky (1988). In a survey, 64% of respondents thought that an increase in in ‡ation from 12% to 17% was acceptable if it lead to a reduction in unemployment from 10% to 5%. However, only 46% of the respondents thought that exactly the same increase in in ‡ation (from 12% to 17%) was acceptable if it increased employment from 90% to 95%. 15 In a well known example, Kahneman and Tversky …nd, in the context of medical decisions, that the choice between various programs depends on whether the choices are posed in terms of lives saved or lives lost (see page 5 in Kahneman and Tversky (2000)). 16 See Part 4 of the book by Kahneman and Tversky (2000). 6 elections (public interest).
2. In making their labor supply decision, individuals might tradeo¤ personal utility from consumption with disutility from labor supply. But the same individual, in his/ her voting choice, might vote for redistributive policies that exhibit public concern (and not purely private self interest).
3. An individual, when buying an air ticket, might also buy travel insurance, thus exhibiting risk averse behavior. But, the same individual, when he reaches his holiday destination, may visit a gambling casino and exhibit risk loving behavior there.
Using this view, we can assume that individuals, when taking their tax evasion decision, exhibit behavior described by PT. But when individuals express their preferences over private and public provision (through, say, surveys, referenda, elections, etc.), these preferences are described by standard utility theory. This could be formalised by allowing the utility function, U (z; :), to depend on a context variable, z. So in our case, for example, z = 1 when an individual is deciding how much tax to evade but z = 2 when the same individual is voting for a tax system.

Results and schematic outline
The results are as follows. Under both EUT and PT, taxpayers evade less as the audit probability and the penalty rate increase. Under PT, additionally, one gets the plausible result that taxpayers evade more if the tax rate increases. Using EUT to model both tax evasion and government behavior is unable to jointly account for the evidence on actual tax gaps and government expenditure. Given actual penalty rates and audit probabilities, if consumer preferences over private versus public consumption yield observed government expenditures, then EUT predicts far too big a tax gap. On the other hand, if consumer preferences over private and public consumption yield observed tax gaps, then EUT predicts far too much government expenditure. Using PT to model both tax evasion and government behavior gives economically absurd results. By contrast, using PT to model the tax evasion decision and standard utility theory to explain government behavior, we have no di¢ culty reconciling observed tax gaps with observed government expenditures, at plausible tax rates.
Section 2 describes the basic model. Sections 3 and 4 consider the tax evasion decision on the basis of EUT and PT, respectively. Sections 5 and 6 derive the resulting optimal tax rates under EUT and PT, respectively. Section 7 compares the success of PT in explaining tax evasion with that of EUT. Finally, section 8 summarizes and concludes. 7

The Model
We consider an economy consisting of a continuum of consumer-taxpayers located on the unit square = [0; 1] [0; 1]. Pretax income is exogenous and is given by the density function, Y (x; s) 0, x 2 [0; 1], s 2 [0; 1], where s (explained in more detail below) captures the stigma faced by a tax evader when caught and x is purely a label that helps locate an individual taxpayer. 17 The government levies tax at the constant rate t 2 [0; 1] on declared income and uses its tax revenue to …nance the public provision of goods and services whose aggregate monetary value is G. 18 The consumer-taxpayer located at (x; s) declares income D (t; G; x; s) dxds, D (t; G; x; s) 2 [0; Y (x; s)]. Thus D (t; G; x; s) is the density of declared income. Subsequent to the …ling of tax returns, an exogenous fraction p 2 (0; 1) of the taxpayers are audited, and the audit reveals the true taxable income. If caught, the dishonest taxpayer must pay the outstanding tax liabilities t [Y (x; s) D (t; G; x; s)] dxds and a penalty proportional to unpaid taxes, t [Y (x; s) D (t; G; x; s)] dxds, where > 0 is the constant penalty rate. The density of tax revenue is then (2.1) Clearly, D and T depend on p and as well as t, G, x and s.

Sequence of moves
The sequence of moves is as follows.
1. The tax authority announces the tax rate, t, the audit probability, p, the penalty rate, , and the monetary value of the provision of publicly provided goods/services, G.
2. Taxpayers make the decision to either report full income or evade a fraction of it, given t, p, and G.
3. The government audits a fraction p of the returns and dishonest taxpayers are required to give up a fraction 1 + of their unreported income.

Exogenous and endogenous variables
The exogenous variables of the model are the probability of an audit, p (which, here, is the same as the probability of detection), the penalty rate, , and the density function of income, Y . When considering tax evasion under prospect theory, we shall introduce and explain the further exogenous variables , and . The endogenous variables of the model are the tax rate, t, the density function of declared income, D, and the tax revenue density function, T . When the consumer-taxpayer makes his tax evasion decision, he takes as exogenous the model parameters, his location, (x; s), the tax rate, t, and the level of public provision, G.
Notation: To simplify notation, we shall suppress reference to the model parameters and drop the in…nitesimal quantities, dx and ds, when such omission is not likely to lead to confusion.

Government tax revenue
We assume that Y is integrable 19 . It will follow from optimizing behavior of consumertaxpayers that D is integrable. Hence, T is also integrable. Let S be measurable 20 . Then the aggregate pretax income of all consumer-taxpayers in S is given by and the tax revenue collected from them is tD (t; G; x; s) dxds In particular, let and 19 Integrability can be interpreted either in the sense of Riemann or Lebesgue, it does not matter for our purposes which interpretation is chosen. However, the Lebesgue integral is more general than that of Riemann, in that it is de…ned for wider classes of functions and domains of integration. It is also associated with powerful convergence theorems. A particularly clear introduction is Bartle (1966). 20 Measurable in the sense of Borel or Lebesgue. Any set of interest to us will be measurable in both senses. 9 then Y is both total income and average income. Likewise, T (t; G) is both total and average tax revenue. We assume that Y > 0.
Note that if S is of measure zero then Y (S) = 0 and, hence, also T (t; G; S) = 0. In particular, the tax revenue collected from any single individual, (x; s), is zero: . This ensures that we can consistently assume that when a taxpayer decides how much income to declare, he can take public provision, G, as given. 21 Thus the government's tax revenue, T (t; G), comes from three sources: 1. taxes on declared income, 2. taxes recovered from those caught evading, p From (2.4) and (2.5), we see that total tax revenue can be written in the slightly simpler form: (2.6) It will be useful to distinguish between those who are unable to evade tax, and those who can evade but choose not to. Therefore, assume taxes are deducted at source for a fraction, !, of the population, where so, these taxpayers cannot evade (if ! = 1, then nobody can evade). We make the following two simplifying assumptions.
A1. The opportunity to evade taxes does not depend on the taxpayer's income: Thus, Y (x; s) does not depend on !. Hence, the tax revenue collected from the consumertaxpayers in S is given by which simpli…es to Ddxds. (2.9) Total tax revenue becomes Ddxds. (2.10) A2. Stigma is unrelated to income: Although di¤erent people su¤er di¤erent rates of stigma, we do not know of any strong evidence that this is related to income, e.g., we do not know of any strong evidence that rich persons su¤er higher, or lower, rates of stigma than poor persons, on account of their income. 22 Our second simplifying assumption is, therefore, that the income density function, The independence of income, stigma and the ability to evade, will give rise to simple expressions for total tax revenue and aggregate utility.

Behavior of government and consumer-taxpayers
The tax authority moves …rst, making an announcement of the tax rate, t, the audit probability, p, the penalty rate, , and the total monetary value of all publicly provided goods and services, G. Given t, p, and G, the taxpayer then makes the decision to either report full income (D = Y ) or evade a fraction of it (D < Y ). Let Y N C be the after tax income of the taxpayer if he is not caught, then If evasion is discovered, the taxpayer also su¤ers some stigma, whose monetary value is s(Y D), where s is the stigma rate on evaded income, s 2 [0; 1]. As in Gordon (1989) 22 See Slemrod (2007, p30) for a review of this. and Besley and Coate (1992), such stigma enters linearly, as a monetary equivalent, into the payo¤ in that state of the world 23 . His after-tax income is then Y C , given by (2.13) The government spends the total tax revenue, T (t; G), given by (2.6), on providing goods and services under the balanced budget constraint, (2.14) Consumers derive utility from private consumption and the publicly provided goods and services. However, when making the decision on how much income to declare, a consumer takes the publicly provided goods and services as given. Thus we have a freerider problem: each consumer derives utility from public provision, but hopes others will pay for it. The government chooses the tax rate, t, so as to maximize social welfare, taking into account the utility individuals derive from private and public consumption and the e¤ect of the tax rate on tax evasion and, hence, on tax revenue.
We shall consider …ve regimes, summarized in Table-I below. We use the shorthand notation CARA for constant absolute risk aversion and log for logarithmic utility. We now describe the …ve regimes more fully. The description will rely heavily on the arguments developed in the introduction; see in particular section 1.4.
1. Regime EUT : The second row of Table-I describes regime EUT. In this regime, consumer preferences over private provision, C, and public provision, G, are given by the tractable form: Expected utility is then 23 Although a natural interpretation of stigma might include factors such as the tug on one's conscience and loss of face among family and community etc., other interpretations of stigma are possible. These might include, in an appropriately speci…ed dynamic game, the reputational costs that impinge on current and future earnings. For a more detailed discussion of stigma in the context of tax evasion as well as a more general formulation, see Dhami and al-Nowaihi (2007).
where Y C and Y N C are given by (2.13) and (2.12), respectively. Maximizing (2.16) gives the density function, D, of declared income and, hence, government tax revenue T (t; G). Thus, the consumer-taxpayer has a complete transitive set of preferences: the same utility function describes both tax evasion and preference between private and public provision. The government uses the same utility function (2.15) to maximize the utility of a representative consumer with average income, Y , who does not evade. In other words, we do not need to invoke context dependence or paternalism to justify this regime. The details are given in section 3, below. This regime turns out to perform poorly in jointly explaining the tax rate, the tax gap and government expenditure.
2. Regime PT1 : The third row of Table-I describes regime PT1. In this regime, consumer preferences over private and public provision are given by the same utility function (2.15) as in regime EUT. However, when making the tax evasion decision, consumer preferences are not given by expected utility (2.16) but by prospect theory (PT), as described in section 4, below. Thus, the consumer-taxpayer has contextdependent preferences (see the introduction): the utility function which describes tax evasion behavior is di¤erent from that which gives preferences between private and public provision. The government uses the same utility function (2.15), as in regime EUT, to maximize the utility of a representative consumer with average income, Y , who does not evade. The details are given in section 4, below. 24 As discussed in the introduction, one could invoke paternalism or incomplete preferences to justify this approach. In contrast to regime EUT, regime PT1 successfully explains the tax rate, the tax gap and the level of government expenditure.
3. Regime PT2 : The utility function (2.15) is additively separable over private and public provision. However, in actual practice, there would seem to be strong complementarities between the two. For example, utility derived from private car ownership heavily depends on the quality of publicly provided roads. We would like to investigate the consequences of recognizing this complementarity. The fourth row of Table-I describes regime PT2. In this regime, consumer preferences over private and public provision are given by the utility function: An interpretation of this model is that public provision, G, has value only insofar as it facilitates private consumption, C. When making the tax evasion decision, consumer preferences are given by prospect theory, as in regime PT1. Thus, the consumer-taxpayer, again, exhibits context-dependent preferences. The government uses the utility function (2.17) to maximize the utility of a representative consumer with average income, Y , who does not evade. The details are given in section 4, below. The results are very similar to that of regime PT2.
4. Regime PT3 : The …fth row of Table-I describes regime PT3. In this regime consumer preferences over private and public provision are given by the utility function: When making the tax evasion decision, consumer preferences are given by prospect theory, as in regimes PT1 and PT2. Thus, the consumer-taxpayer, again, exhibits context-dependent preferences. The government uses the utility function (2.18) to maximize the sum (in the form of an integral) of the utilities of all the consumers in the state in which they do not evade. The details are given in section 4, below. Note that regime PT3 di¤ers from regimes EUT and PT1 in two ways: the coe¢ cient of relative risk aversion, , can now take any value (not just 1) and the government now maximizes the sum of all utilities, not that of a representative consumer. Yet the result is very close to those of regimes PT1 and PT2 and regime PT3 successfully explains the tax rate, tax gap and government expenditure.
5. Regime PT4 : The last row of Table-I describes regime PT4. In this regime, when taking the tax evasion decision, consumer preferences are given by prospect theory, as in regimes PT1, PT2 and PT3. However, the government also uses these same prospect theory preferences to maximize the sum of the utilities of all the consumers. The details are given in section 4, below. As in regime EUT, we do not need to invoke either context dependence or paternalism. The results turn out not only to be empirically incorrect (as in regime EUT) but also economically absurd.
To summarize, in the context of the simple model of this paper, EUT cannot reconcile the observed tax rate, tax gap and the level of government expenditure. At the other extreme, modelling both the tax evasion decision and government behavior using PT gives absurd results. The best results are obtained by using PT to model tax evasion behavior but standard utility theory to model government behavior. The key is to recognize the context dependence of preferences.

The Taxpayer' s Tax Evasion Problem: Expected Utility Theory
Consider the case of a consumer-taxpayer located at (x; s) who derives utility, U (C; G), from private consumption, C, and the level of public provision, G. Given his income, Y (x; s), the level of public provision, G, and the values of the parameters t, p, , and s, the consumer chooses the amount of income to declare, D. If he is not caught (with probability 1 p), then his disposable income, and hence private consumption, is Y N C . However, if he is caught (with probability p), then his disposable income, hence private consumption, is Y C . His expected utility is thus where Y C and Y N C are given by (2.13) and (2.12), respectively. 25

The Yitzhaki result
Eliminate D from (2.13) and (2.12) to get For the special case of no stigma, s = 0, (3.2) reduces to We may view the problem as choosing Y C and Y N C so as to maximize expected utility (3.1) subject to the budget constraint (3.3), given income (1 t) Y and prices 1 1+ and 1+ . Since prices do not depend on the tax rate, t, an increase in the tax rate has a pure income e¤ect. Making the plausible assumption of constant or declining absolute risk aversion, we get that an increase in the tax rate reduces tax evasion. 26 In the more general case with stigma, (3.2), a change in the tax rate will have both income and substitution e¤ects. However, simulations with plausible functional forms and parameter values indicate that in the presence of stigma, an increase in the tax rate causes a decline in evasion under EUT. 27 This result, obtained by Yitzhaki (1974), is rejected by the bulk of experimental, econometric and survey evidence. 28 25 Obviously, EU depends on D, G, x, s, t, , , s, and p. We have omitted reference to these to reduce the burden of notation. 26 For a formal proof see, for example, Dhami and al-Nowaihi (2007, Proposition 2). Along with constant or declining absolute risk aversion, we also need the assumptions 0

Tax evasion under EUT
As described above in section 2.4, in the regime EUT when the taxpayer uses EUT, the preferences over private and public goods consumption are given by (2.15) and expected utility is given by (2.16). Note that expected utility (2.16) di¤ers across consumer-taxpayers only in so far as they di¤er in income, Y (x; s) and stigma, s. Di¤erentiating (2.16) with respect to D, using (2.12) and (2.13), gives If t > 0, then (3.6) holds everywhere. However, if t = 0, then @ 2 EU @D 2 = 0 at s = 0. Hence, (3.6) holds almost everywhere. Since EU is continuous on the compact interval, 0 D Y (x; s), a maximum, D (t; G; x; s), exists. This maximum is unique except at t = s = 0. The …rst order conditions for a maximum are @EU Let 29 Thus, taxpayers who would su¤er low stigma if caught (the interval 0 s s 1 ) hide all their income. Taxpayers who would su¤er moderate stigma if caught (the interval s 1 < s < s 2 ) hide some but not all income. Taxpayers who would su¤er high stigma if caught (the interval s 2 s 1) declare all their income. Also note that declared income, D, does not depend on the level of public provision, G. The latter result follows from the facts that utility (2.15) is additively separable in private and public consumption and the consumer-taxpayer takes public provision as given when deciding who much income to declare. 29 For p > 0 and 0 t 1, it can be shown that s 1 1 p p t t.

Tax revenue under EUT
From (3.10) we get: We now invoke the assumption that income distribution, Y (x; s), is independent of stigma, s, so that Letting we get (3.14) Substitute from (3.14) into (2.10), to get total tax revenue: Now, impose the government budget constraint, T (t; G (t)) = G (t), to get total public expenditure (which is also per capita public expenditure):

The Taxpayer' s Tax Evasion Problem: Prospect Theory
The preferences of the taxpayer under prospect theory are more complicated. Those familiar with prospect theory should just skim the material in this section, skipping directly to (4.9) for the value function under PT. For those unfamiliar with PT, we provide a selfcontained treatment below. The basic building blocks of prospect theory can be heuristically explained as follows. 30 Prospect theory distinguishes between two phases in decision making: an editing phase, followed by an evaluation phase. In the editing phase, a complex problem is …rst simpli…ed to facilitate decision making. In the evaluation phase, the highest value prospect is chosen. During the editing phase outcomes are coded as gains or losses relative to a reference point. The reference point is usually, but not necessarily, the status quo. 31 While there is no general theory of the editing phase, prospect theory has a very precise theory of the evaluation phase. Suppose that a consumer faces a lottery (or prospect) with several possible outcomes. First, each outcome in the prospect is assigned a number, using a utility function. This number is a positive real number if it has been coded as a gain relative to the reference point, and a negative number if it has been coded as a loss (the reference point having been arrived at in the editing phase). The utility function under prospect theory has the following properties: continuity, monotonicity, reference dependence, declining sensitivity and loss aversion. Continuity and monotonicity are as in EUT.
Unlike EUT where the carriers of utility are …nal levels of wealth (or incomes or consumption levels or commodities), under prospect theory the carriers of utility are gains and losses relative to the reference point. Declining sensitivity means that the utility function is concave in the domain of gains and convex in the domain of losses. Loss aversion is based on the idea that losses are more salient than gains. Given an amount of money, y > 0, and a utility function, v(y), (to be speci…ed precisely below) loss aversion implies that v(y) > v( y).
Finally, the utilities of each outcome in a prospect are aggregated using decision weights into a value function. These decision weights are non-linear functions of the cumulative probabilities. Probabilities in the two domains (gains and losses) being cumulated separately. The probability weighting function used for the domain of gains need not be the same as that for losses. Decision weights are not probabilities and do not, necessarily, add up to one (unlike EUT). However, if all outcomes are either in the domain of gains or if all are in the domain of losses, then decision weights do add up to one and can be interpreted as probabilities. Agents facing uncertain situations overweight small probabilities but underweight large ones. 32 . In choosing among several prospects, an individual using 30 There is a substantial body of evidence in support of these building blocks of prospect theory, as well as a mounting number of successful applications in economics; see, for instance, the collection of papers in Kahneman and Tversky (2000) and in Camerer et al. (2004). 31 Also in the editing phase it is decided which low probability events to ignore and which high probability events to treat as certain. 32 In Kahneman and Tversky (1979) decision weights are transformed probabilities. This proved unsatis-PT chooses the one that gives rise to a higher number for the value function. We now provide a more formal treatment of the building blocks of prospect theory.

Utility of an outcome under PT
Since the consumer takes public provision as given when deciding how much tax to evade, let us assume that public provision is ignored in the editing phase. Let the reference private consumption of the taxpayer be R. Then private consumption relative to the reference point is As in Kahneman and Tversky (1979) and Tversky and Kahneman (1992), the utility, v(X i ), associated with an outcome X i is given by where > 1 is the parameter of loss aversion; it ensures that a loss is more salient than a gain of equal monetarily value. Based on experimental evidence, Tversky and Kahneman (1992) suggests that ' 0:88 and ' 2:25. 33

The reference point under PT
Although prospect theory does not provide su¢ cient guidance to determine the reference point in each possible situation, there is often a plausible candidate for a reference point. Indeed, specifying a suitable reference point is often essential for a successful application of prospect theory. As in Dhami and al-Nowaihi (2007), we take the legal after-tax income (which is also the level of private consumption expenditure in the absence of tax evasion) as the reference point in this paper. 34

The decision problem under PT
Then, using (2.12), (2.13) and recalling that 0 D Y , we get Hence, the taxpayer is in the domain of losses if caught but in the domain of gains if not caught. Let v be the taxpayer's value function and w + , w be her probability weighting function for the domains of gains and losses, respectively. 35 Then, according to prospect theory (PT), the taxpayer maximizes: Comparing (4.6) with the analogous expression (3.1) for expected utility theory, we see the following di¤erences. First, the carriers of utility in PT are gains and losses relative to the reference point rather than …nal levels. Second, one uses decision weights in PT to aggregate outcomes while one uses objective probabilities under expected utility theory. 35 By a probability weighting function we mean a strictly increasing function w : [0; 1] onto ! [0; 1] : Note that a probability weighting function, w, has a unique inverse, w 1 : [0; 1] onto ! [0; 1] and that w 1 is strictly increasing. Furthermore, it follows that w and w 1 are continuous and must satisfy w (0) = w 1 (0) = 0 and w (1) = w 1 (1) = 1: For an example of a probability weighting function, see below.
Third, the level of public provision, G, is present in (3.1) but absent for (4.6). This is because, in EUT, a decision maker has a complete preference relation over all outcomes.

The probability weighting function
Empirical evidence is widely consistent with an inverted S shaped form for the weighting function; see for example Kahneman and Tversky (1979), Tversky and Kahneman (1992) and Prelec (1998). Denoting by p the cumulative probability, Prelec (1998) derives the following weighting function (see Figure 4.2). 36 By the Prelec function we mean the probability weighting function w : [0; 1] onto ! [0; 1] given by 37 : where 0 < 1. The smaller is, the more the overweighting of small probabilities   36 There are several advantages in using the Prelec weighting function, relative to the others suggested in the literature. First, it has an inverted S shape which is consistent with experimental evidence. Second, it is based on axiomatic foundations; see Prelec (1998), Luce (2001) and al-Nowaihi and . Third, it has the same form for gains and losses. 37 To quote from Prelec (1998, last line of Appendix A): "Empirically ... one observes w + (p) = w (p)." Therefore, in our calibration exercises, we shall take w + (p) = w (p). and w(p) ! p as ! 1. Thus, the weights approach objective probabilities as ! 1. 38 Figure 4.2 plots the Prelec function for = 0:225. 39

Tax evasion under PT
Substituting (4.8) in (4.7) we get then (4.9) can be written as:  The solution to the tax evasion problem under PT, when the probability of detection is …xed, is a bang-bang (or corner) solution. We would argue that the bang-bang solution seems descriptive of several forms of tax evasion which take the form of hiding certain activities completely from the tax authorities while fully declaring other sources. For instance, an academic might not report income arising from an invited but paid lecture. A school teacher might not report tuition income for after-school lessons. A householder might pay cash to a builder for a minor extension of the house. Line item reporting of tax returns might further encourage this behavior. This, bang-bang, implication of reporting taxable income can also be drawn from the experimental results of Pudney 38 There is no reason to suppose that di¤erent uncertain situations should have the same , hence, prospect theory does not impose an exact value on . For instance, people might overweight the probability of dying in an air crash far higher relative to dying in a car accident. 39 Based on experimental data, Prelec (1998) estimates ' 0:65. However, Bernasconi (1998) argues that, because of ambiguity aversion, taxpayers 'in the wild' would exhibit more overweighting of low probabilities than in the laboratory. Bernasconi (1998) reports that, while actual probabilities of audits are in the range 0:01 to 0:03, an average of USA taxpayers'assessments of the audit probability is 0:09. Taking a central value, this gives the value = 0:225 that we use. et al. (2000). Slemrod and Yitzhaki (2002) …nd, based on TCMP data for 1988, that "the voluntary reporting percentage was 99.5% for wages and salaries, but only 41.4% for self-employment income". Additional support comes from the behavior of non-pro…t organizations whose pro…ts from activities unrelated to their primary tax exempt purpose are subject to federal and state tax. The reporting behavior of such organizations is also suggestive of the bang-bang solution; see Omer and Yetman (2002). 40 Assuming non-increasing absolute risk aversion, EUT predicts (Yitzhaki, 1974) that individuals evade less income as the tax rate increases. On the other hand, PT predicts the more factual result that tax evasion increases with an increase in the tax rate. This is formally stated in Proposition 1 below. The proof of this, and other results, can be found in Dhami and al-Nowaihi (2007). Consider the more general case of an endogenous probability of detection p(D) such that p (D) is continuously di¤erentiable and p 0 (D) 0: In this case the declared income can have an interior solution and varies continuously with the exogenous parameters. The comparative static results in this section, including, in particular, the explanation of the Yitzhaki puzzle, can also be demonstrated for the more general case. 41 However, the general case is not very conducive for undertaking calibration exercises which is the main method we use for distinguishing among alternative theories. 42 40 Dhami and al-Nowaihi (2007) introduce an endogenous probability of detection p(D) such that p (D) is continuously di¤erentiable and p 0 (D) 0 i.e. the taxpayer is more likely to be caught if s(he) evades more. This induces enough curvature in the model for interior solutions. However, this does not alter the comparative static results. 41 For the general case, Dhami and al-Nowaihi (2007, Proposition 4) prove the following: (a) At a regular interior optimum, tax evasion is strictly decreasing in the punishment rate, , the stigma rate, s, and the coe¢ cient of loss aversion, . However, tax evasion is strictly increasing in the tax rate, t. (b) At an optimum on the boundary (D = 0 or D = Y ), tax evasion is non-increasing in the punishment rate, , the stigma rate, s, and the coe¢ cient of loss aversion, . Tax evasion is non-decreasing in the tax rate, t. 42 Even in macroeconomics where calibration is most prevalent, despite the non-linearities it is usually the simpli…ed log-linearized version of the model that is used in the calibration exercises.

Optimal tax under EUT (regime EUT)
We have, so far, considered the tax evasion decision of taxpayers when they respectively following EUT and PT and also the associated tax revenues under each form of preferences. We now analyze the optimal tax decision of a central planner under EUT. The objective function of the government is explained in section 2.4 above, for the case of regime EUT. It chooses the tax rate so as to maximize the utility of a representative consumer, with average income, Y , in the state where he does not evade 43 :  (2002) and Andreoni et al. (1998) we can infer the value ! = 0:4. While the general view seems to be that stigma costs from evasion are low, for example see Brooks (2001), we are not aware of the exact magnitudes. Some evidence is available from stigma costs that arises from claiming welfare bene…ts. For Britain, Pudney et al. (2002) …nd that the total stigma costs (which they de…ne as stigma, hassle, search costs, etc.) range from about 0:1 to 0:2. We do not know the appropriate value of , which determines society's preference for public compared to private consumption.
Calibration Strategy: Our strategy is as follows. For values of in the interval (0; 1), we compute the optimal tax rate, the resulting level of tax revenue, G (t), and the tax gap ratio, tY G(t) G(t) , where the denominator is total tax revenue and the numerator (the tax gap) is the di¤erence between what is theoretically owed in taxes (tY ) and what is actually collected in taxes (G (t)). For the USA, the tax gap ratio for 1992 was 0:222 and for 1998 it was 0:199 (Americans for Fair Taxation). For the USA, total government tax revenue, as a percentage of GDP was 32% for 2004 (Laurin, 2006). We use the normalization Y = 100.
Calibration Results: The calibration results are tabulated in Table-II.  From Table-II we see that as society's desire for public provision (measured by ) increases, so does the optimal tax rate, t, and government tax revenue, G (t). This result is exactly what is expected. However, tax evasion as measured by the tax gap ratio, tY G(t) G(t) , drops very dramatically with an increase in the tax rate, from the very high value of 1: 28 to the very low value of 0:04. Thus tax evasion can be practically eliminated by a tax rate of almost 100%. This is the Yitzhaki puzzle under EUT described above.
From the fourth row we see that, at all tax rates, s 2 = 1 (see (3.9), (3.10)), i.e., all taxpayers who can evade, evade at least some tax. From the …rst column, we see that s 1 = 0:73 (see (3.8), (3.10)), i.e., of taxpayers who can evade, 73% evade all taxes. This steadily declines as the tax rate increases so that for tax rates in excess of 73%; all tax payers pay at least some tax.
From the …rst highlighted column, we see that at the tax rate t = 0:52; government tax revenue is similar to actual values (G (0:52) = 31:71%, compared to the actual value of 32%). However, the predicted tax gap ratio is far too high ( 0:52Y G(0:52) G(0:52) = 0:64, compared to the actual value of about 0:2). From the second highlighted column, we see that the tax rate t = 0:77 gives a tax gap ratio close to that observed ( 0:77Y G(0:77) G(0:77) = 0:2 ). However, the same tax rate gives a total tax revenue (G (t) = 64:1%) that is about twice the observed value. To summarize, our simple EUT optimal tax model with tax evasion can either get tax revenue right or the tax gap right but not both.
We do not think that this result is merely the consequence of using a simple model, but appears to be a problem inherent in the EUT approach to tax evasion. We give some arguments in support of this: 1. (Risk aversion) The use of logarithmic utility entails a coe¢ cient of relative risk aversion of 1, which is rather low. What results can we expect when a government assigns, in its social welfare function, a higher value for the coe¢ cient of risk aversion? Bernasconi (1998) reports that coe¢ cients of relative risk aversion lie in the range 1 to 2. However, Skinner and Slemrod (1985) report that a coe¢ cient of relative risk aversion of 70 is needed to square the extent of tax evasion under EUT with the evidence. 44 Our simulations (see Section 7) support this, suggesting that increasing the coe¢ cient of relative risk aversion to 2 would not make much di¤erence.

(More general preferences)
The simulations (see Section 7) also suggest that the additively separable form of the utility function for the representative taxpayer is not the problem, nor is the assumption of a representative consumer used to evaluate social welfare.
3. (Stigma) Increasing stigma cost would reduce tax evasion but, as discussed above, our assumed level of stigma appears already to be on the high side.

(Other factors)
How about realistic features of the tax system that we have not included? For example, forms of taxes other than income taxes, fraudulent claims of bene…ts and costs of enforcing tax compliance. But in each of these cases we expect the paradox to reappear: at a tax rate that would generate the observed government tax revenue, EUT predicts too much evasion. Because of the Yitzhaki (1974) result, an increase in the tax rate that would reduce evasion to observed magnitudes would simultaneously increase tax revenue to well above observed values. Modelling labour supply has its own problems. 45 44 Mankiw and Zeldes (1991) calculated that a coe¢ cient of relative risk aversion of 30 implies that the certainty equivalent of the prospect win 50; 000 or 100; 00 each with probability 0:5 is 51; 209. 45 A large number of studies show a small negative aggregate labour supply elasticity of income (see, for example, Pencavel (1986) and Killingworth and Heckman (1986)). Clearly this would compound rather than solve the di¢ culties faced by EUT in explaining tax evasion. Many studies also show a small positive aggregate labour supply elasticity and some show a large positive elasticity. Many problems of applying EUT to labour supply are discussed in Bewley (1999).

Optimal tax under PT (regimes PT1 to PT4)
We now consider the four regimes, PT1 to PT4 in which the taxpayer uses PT in making the tax evasion decision. In all of them the tax evasion decision is modelled using prospect theory.

Regime PT1
Here the tax evasion decision is modelled using PT, as in section (4) above. In particular, tax revenues are now given by (4.16). However, the government's objective is still to maximize (5.1), the utility of a representative consumer-taxpayer with average income who does not evade. But now the relevant budget constraint is (4.16). It is routine, though tedious, to show that this optimizing problem has a unique maximum, and that the resulting optimal tax rate is in the interior of the interval [0; 1]. Hence, it can be found by solving the …rst order condition U 0 (t) = 0. Some simple algebra show that this is equivalent to solving (1 t) @G(t) @t = (1 ) G (t). The solution is: If ! = 1, then no one can evade. If 0 then all those who can evade decide not to. If 1 then all those who can evade do so. In all these cases the optimal tax rate is t = .
However, the more empirically relevant case is when ! < 1 and 0 < < 1. Here some taxpayers can evade but choose not to. In this case the …rst order condition gives the quadratic equation in t: The optimum is the solution with the negative square root. It is given by: where, (1 + )) .

Regime PT2
The utility function (5.1) is additively separable over private and public provision. However, in actual practice, there are strong complementarities between the two. To take account of this, the utility function is now given by (2.17), as explained in section 2.4 under the regime PT2. In this model, we see that public provision has value only insofar as it facilitates private consumption. As with regime PT1, the government chooses the tax rate, t, so as to maximize the utility of a representative consumer with average income who does not evade, but now with the utility function: where G (t) is given by the budget constraint (4.16). The condition 0 < B G(t) < (1 t) Y , together with (4.16), guarantees that any optimum must be an interior point. Hence it must satisfy the …rst order condition U 0 (t) = 0. Some simple algebra shows that this is equivalent to solving B @G @t = Y G 2 . Using (4.16), the solution is If ! = 1, then no one can evade. If 0 then all those who can evade decide not to. In both cases the optimal tax rate is t = p B Y . For the case t 1, all those who can evade, do evade. However, the most empirically relevant case is when 0 < t < 1. Here some taxpayers can evade but choose not to. In this case the …rst order condition gives the quartic equation in t: (1 !) 2 (1 p (1 + )) 2 2 t 4 2 (1 !) (1 p (1 + )) t 3 + t 2 +2 (1 !) (1 p (1 + )) t 1 = 0; 0 < t < 1 and = Y 2 B : (6.5) (6.5) has, of course, four roots. However, in simulations we always found the optimum to be the only root in the interval (0; 1).

Regime PT3
So far, we have considered a representative consumer. An alternative welfare criterion is for the government to maximize the sum (or average) utility. For this we need to specify an income distribution. We choose the gamma distribution because of its tractability and because it gives quite a good representation of income distribution for middle incomes (which are the most relevant for determining aggregate tax revenue: the poor do not have much income and the rich are few in number) see, for example, Cowell (2000, p146). The Appendix gives all we need to know about the gamma distribution.
As explained in section 2.4 under the regime PT3, we generalize regime PT1 in two ways. Society's preference over private and public provision is given by the more general utility function (2.18). Instead of a representative consumer, the government now chooses the tax rate, t, so as to maximize the sum (or average) utility, i.e., the social welfare function, W (t), is now: where G (t) is given by (4.16). Thus the government chooses the tax rate, t, so as to maximize society's total (or average) utility in the state where none evades tax. However, the government does allow for the e¤ect on the tax revenue, G (t), of tax evasion as described by PT. (6.6) simpli…es to: The combination of the power form (2.18) for utility with the gamma distribution for income (10.1) leads to very tractable math. In particular, we get From (6.7) and (6.8), we get: .
From (4.16) and (6.9), we get: When 0; no taxpayer evades. When t 1, all those who can evade, do evade. However, the empirically more relevant case is when 0 < t < 1, where some who can evade do but others who can evade choose not to. For this case, the …rst order condition, W 0 (t) = 0 gives the following non-linear equation: (1 t) (6.12) Equation (6.12) can be easily solved numerically, given values for the parameters.

Regime PT4
Finally, we consider a regime where tax evasion is described by PT and, also, the government chooses the tax rate, t, so as to maximize the sum (or average) of the PT utilities of the consumer-taxpayers. Thus, social welfare is now given by: Substituting from (4.12) into (6.13), we get: (6.14) From (4.13) and (6.14), we get Invoking our assumption of independence of stigma and income we get, from (6.15), From (4.11) and (6.16), we get: We shall consider only the case 0 < t < 1, since this is the empirically relevant one. Here, some taxpayers who can evade do, but others who can evade do not. It can be checked that the optimal tax depends on the sign of If < 0, then welfare is strictly decreasing in t. Hence, the optimal tax rate is zero. This result is not surprising at all, since public provision does not …gure in the utility functions, thus increasing the tax rate reduces private consumption without any gain. If > 0, then welfare is strictly increasing in the tax rate. Thus, the optimal tax rate is t = 1 and the 'optimum'is for the government to con…scate all the wealth of those who cannot evade or will not evade. This causes a huge loss in private consumption, but with no gain, since public provision is not in the utility functions.
The reason for this absurd result is that taxation shifts the reference point down. Those who cannot, or will not, evade are always at their reference point, so always have zero utility relative to that. Those who evade, and are not caught, have a huge increase in utility, measured relative to their reference point. But this huge relative increase in relative utility has no real welfare signi…cance. Surprisingly, it is this, the most absurd of cases, that is picked up by our simulations (see next section). Hence, the message is that inferring preferences revealed in an inappropriate context can have disastrous policy e¤ects.

Optimal Taxation: PT and EUT compared
We now compare the relative success of PT and EUT in explaining tax evasion. We use the same parameter values as before, when taxpayers had expected utility preferences: p = 0:015, = 0:5, and ! = 0:4. We take the values = 0:88 and = 2:25 from Tversky and Kahneman (1992). We adopt the Prelec probability weighting function, w + = w = e ( ln p) , with the value = 0:225 implied by the data reported by Bernasconi (1998). The results are tabulated below, where na stands for 'not applicable'. The …rst two rows are reproduced from Table-II, which gave results of calibrating regime EUT. Consumer-taxpayers are expected utility maximizers and use the same utility function to decide how much income to declare and also to express their preferences over private and public provision. The government uses this same utility function to determine the optimal tax rate. From the …rst row, we see that = 0:4 gives about the correct level of total government tax revenues, G (t), for the USA. However, the tax gap ratio, tY G(t) G(t) , is far too high, being about three times the correct value. On the other hand, from the second row, we see that = 0:65 gives about the correct tax gap ratio for the USA. But then total government tax revenue is far too high, being about twice what it should be. The reason is that at observed audit probabilities and penalty and stigma rates, evasion is very attractive to an expected utility maximizer. Hence evasion is too high at observed levels of taxation. However, because of the Yitzhaki e¤ect, increasing the tax rate reduces evasion. So, to get observed evasion rates, taxes have to be too high.
Rows 3 to 7 give the results when tax evasion is described by PT. Row 3 gives the results for regime PT1. For PT1, consumer-taxpayers'evasion decisions are described by PT. However, their preferences over private and public consumption are given by the same utility function as in regime EUT. 46 The government chooses the tax rate so as to maximize the same welfare function as in regime EUT. However, the government takes into account the e¤ect on tax revenues as predicted by PT.
Row 3 of the table shows that when consumer preferences over private and public consumption ( = 0:44) give a tax revenue (G (t) = 32: 28) close to what is observed ( 32), we automatically get a tax gap ratio ( tY G(t) G(t) = 0:198) close to what is observed (0:2). This suggests that PT provides the correct model for tax evasion.
The welfare function used in regime PT1 is additively separable over private and public consumption. By contrast, the welfare function used in regime PT2 exhibits strong complementarity between the two. From row 4, we see that when preferences between private consumption and public provision (B = 1539) give a tax revenue (G (t) = 32: 14) close to the observed value (32), they also gives a value for the output gap ratio ( tY G(t) G(t) = 0:197) close to what is observed (0:2). Thus PT successfully explains tax evasion with two very di¤erent welfare functions.
For regime PT3, the criterion is to maximize aggregate (or average) utility, rather than that of a representative consumer of regimes PT1 and PT2. Also, regime PT3 generalizes PT1 in that it allows any (constant) coe¢ cient of relative risk aversion, , while in PT1 this coe¢ cient was = 1. In fact, for = 1, PT3 reduces to PT1. Since = 1 is at the low end of the range (between one to two) reported by Bernasconi (1998), in row 5 we choose = 2 to test the high end. The value = 0:45 results in approximately the correct level of tax revenue (G (t) = 32:2) and gives a nearly correct output gap ratio (0:197). Row 6 reports the results for setting = 0:12, which is consistent with the value of = 0:88, reported by Tversky and Kahneman (1992). 47 The value = 0:58 gives a near correct value for tax revenue (32:4) and, again, a near correct value for the output gap ratio (0:199).
Using PT to model the tax evasion decision works for the following reasons. By taking the reference point to be the legal after-tax income, the taxpayer is in the domain of gains if not caught but in the domain of losses if caught. 48 This allows loss aversion, overweighting of the small probability of detection and underweighting the high probability of nondetection to considerably increase the deterrence e¤ect of punishment, despite low values of p and (and the mild convexity of the value function for losses). This facilitates the derivation of the correct government budget constraint. By contrast, under EUT we get the wrong government budget constraint.
A fundamental principle of liberalism is that people are the best judges of their own welfare. However, the regimes PT1-PT3 use one set of preferences (PT) to model the tax evasion decision but another (standard utility theory) to describe preferences over private and public consumption. Furthermore, these regimes are successful in accounting for the observed facts on the tax gap and government expenditure.
This conclusion is further reinforced by the results of regime PT4. Regime PT4 uses PT to explain tax evasion (as in PT1-PT3), but also uses PT as a welfare criterion. Row 7 shows that PT4 gives wildly wrong values for all the variables. Worse, it gives the economically absurd result that when neither consumers nor government care about public provision, then the optimal tax rate is 100%. This absurd result comes about because PT4 works by pushing down the reference point, so giving very high relative utility. But that does not correspond to any sensible measure of welfare. Thus, although PT gives the correct budget constraint in PT4, it also gives a completely wrong welfare criterion. The message is clear: inferring preferences from the wrong context can have disastrous policy e¤ects.

Summary
A fundamental assumption of neoclassical economics is that decision makers have a complete transitive ordering over all possible outcomes. Research over the last 60 years has shown that this is not valid, not even as a rough approximation. With a simple general equilibrium model of optimal taxation in the presence of tax evasion, we were able to account well for the observed magnitudes. The key was recognizing that preferences are context dependent. In our successful regimes, we assumed that consumer-taxpayers behave according to prospect theory when they are considering how much income to declare. But that government, when maximizing social welfare on their behalf, use standard utility theory. However, when either expected utility on its own or prospect theory on its own is used to model both tax evasion and social welfare then the calibrated results are not in conformity with the data.
At observed audit probabilities and penalty and stigma rates, tax evasion is very attractive to an expected utility maximizer. Hence expected utility theory (EUT) predicts levels of evasion that are too high at observed levels of taxation. However, because of the Yitzhaki e¤ect, increasing the tax rate reduces evasion. So, to get observed evasion rates, taxes have to be too high.
Under prospect theory (PT), taking the reference point to be the legal after-tax income ensures that the taxpayer is always in the domain of losses if caught and always in the domain of gains if not caught. Loss aversion, overweighting of low probabilities and the concavity of the value function for losses then ensure that punishment hurts more under PT than under EUT. Because of underweighting of high probabilities, the prospect of not being caught is less attractive under PT than under EUT. Hence, the deterrence e¤ect of punishment is much stronger under PT. Thus, using PT to model tax evasion produces the observed levels of tax evasion.
Because of bounded rationality, consumer-taxpayers have to simplify considerably, concentrating on the salient features of the problem at hand. Hence, we cannot take their tax evasion behavior as revealing their preferences over private versus public consumption. Thus while PT gives the correct government budget constraint, it gives a completely wrong welfare criterion.
A fundamental principle of liberalism is that people are the best judges of their own welfare. Neoclassical economics recognizes that applying this principle is problematic; governments have to take account of imperfections such as externalities, market failures and monopoly power. However, PT highlights additional issues. But, this is not necessarily a reason to restrict liberalism. On the contrary, by providing a descriptively more correct theory, PT can potentially enhance the applicability of liberal principles.

Acknowledgements
We are grateful for comments during invited seminar presentations at Cambridge, Warwick and Keele. In particular we would like to thank Joel Slemrod and Toke Aidt for comments and J. Barkley Rosser Jr. for the encouragement to write this paper.

Appendix: The Gamma Distribution
Most of the de…nitions, below, can also be found in Cowell (2000). The gamma distribution is given by The Gini index for the gamma distribution is given by: X a 1 e X dX dY 1. (10.5) The simplest gamma distribution that has the correct shape is: