Reexamining the Consumption-Wealth Relationship: The Role of Model Uncertainty

In their influential work on the consumption-wealth relationship, Lettau and Ludvigson found that while consumption responds to permanent changes in wealth in the expected manner, most changes in wealth are transitory with no effect on consumption. We investigate the robustness of these results to model uncertainty using Bayesian model averaging. We find that there is model uncertainty with regard to the number of cointegrating vectors, the form of deterministic components, lag length, and whether the cointegrating residuals affect consumption and income directly. Whether this uncertainty has important implications depends on the researcher's attitude toward this economic theory used by Lettau and Ludvigson. If we work with their exact model, our findings are very similar. However, if we work with a broader set of models, we find that the exact magnitude of the role of permanent shocks is difficult to estimate precisely. Thus, although some support exists for the view that the role of shocks is small, we cannot rule out the possibility that they have a substantive effect on consumption.


Introduction
Textbook wisdom suggests that the size of the wealth e¤ect (i.e. the change in consumption induced by a $1 increase in wealth) should be approximately 5 cents. 1 However, recent events which caused large changes in wealth without commensurately large changes in consumption (e.g. the stock market crash of the October 1987 and the rise and fall of stock prices associated with the "New Economy" of the 1990s) have raised the issue of whether the wealth e¤ect really is this large. This uncertainty over the magnitude of the wealth e¤ect has stimulated a large amount of recent research. An important paper, Lettau and Ludvigson (2004) (LL hereafter), presents empirical evidence which attempts to reconcile the apparent con ‡ict between textbook wisdom and recent experience. Using cointegration techniques, LL estimate permanent and transitory components of the ‡uctuations in wealth. They …nd that consumption does respond to permanent changes in wealth in the textbook fashion, but does not respond to transitory changes in wealth. However, most of the ‡uctuations in wealth are transitory. In essence, the "Dotcom bubble" had little e¤ect on consumption since its e¤ects on wealth were correctly anticipated to be transitory. Like many empirical conclusions in economics, LL's are based on the estimated properties of a single econometric model, selected after a long and careful speci…cation search. For many years, economists and statisticians have been worried about such a strategy which treats the …nal selected model as though it were the true and ignores the evidence from other models. For instance, Leamer (1978) presents a persuasive argument for basing empirical conclusions on all possible models under consideration, instead of selecting one model through a datadriven speci…cation search. In the statistical literature, Draper (1995) and Hodges (1987) make similar calls for model uncertainty to be re ‡ected in policy analysis. Macroeconomic contributions such as Cogley and Sargent (2003), Fernandez, Ley and Steel (2001) and Sala-i-Martin, Doppelho¤er and Miller (2004) have convinced many of the importance of model uncertainty for empirical practice. Most such papers adopt a Bayesian perspective since it treats models as random variables and, thus, averaging across models can be done in a logical and coherent manner. 2 The basic idea of Bayesian model averaging can be explained quite simply. Suppose the 1 Lettau and Ludvigson (2004) present results for a wide sample of macroeconomics textbooks indicating support for values in this region.
2 Papers which investigate the use of non-Bayesian model averaging techniques in macroeconomic applications include Garratt, Lee, Pesaran and Shin (2003), Min and Zellner (1993) and Sala-i-Martin (1997). An example of a non-Bayesian theoretical discussion of model averaging is Hjort and Claeskens (2003). 1 researcher is entertaining R possible models, denoted by M 1 ; :::; M R , to learn about an unknown feature of the economy, (e.g. a variance decomposition or an impulse response to a particular shock). If we treat and M r as random variables, the rules of conditional expectation imply that: (1.1) Thus, overall point estimates, E ( jData), should be an average of point estimates in individual models, E ( jData; M r ). 3 The weights in the average are the posterior model probabilities, p (M r jData). There are many other justi…cations for using model averaging, ranging from a desire to avoid the well-known pre-test problem [e.g. Poirier (1995), pages 519-523] to formal justi…cations of such an approach in decision theoretic contexts [e.g. Min and Zellner (1993) or Raftery, Madigan and Hoeting (1997)].
Given the importance for macroeconomic policy of LL's results and the large and growing literature using a similar speci…cation [see Julliard (2005)], it is crucial to investigate how robust their results are with respect to plausible changes in model assumptions and the Bayesian framework is a logical one in which to approach this problem. This is what we do in this paper. Using Bayesian reference prior methods for cointegrated models developed in a series of papers [Strachan (2003), Strachan and Inder (2004) and Strachan and van Dijk (2003)], we carry out a Bayesian model averaging exercise with the data used in LL where di¤erent models are de…ned depending on the number of cointegrating vectors, form of deterministic trends, lag length and whether the cointegrating residuals a¤ect consumption and income directly. We …nd that many di¤erent models receive appreciable support from the data and, thus, substantial model uncertainty exists. As a result of this, we …nd that the exact magnitude of the role of permanent shocks is hard to estimate precisely. Thus, although some support exists for the view that their role is small, we cannot rule out the possibility that they have a substantive role to play. In particular, the posterior distributions of key variance decompositions are found to be multimodal and relatively non-informative. However, when we consider only the single model used by LL with the cointegrating residual only directly allowed to e¤ect wealth (and not consumption or income), then our results are similar to theirs.

Cointegration and the Consumption-Wealth Relationship
LL, using derivations from an earlier paper [Lettau and Ludvigson (2001)], base their empirical analysis of the cointegration-wealth relationship on an accounting relationship based on the budget constraint linking tomorrow's wealth to today's wealth, consumption and the rate of return on wealth. Because wealth contains a human capital component it is not directly observable. Accordingly, LL log linearize the budget constraint relationship, assume consumption growth and the returns on human capital and asset wealth are stationary and relate the nonstationary component of human capital to labor income. They end up with a model where log consumption, c t , log asset wealth, a t , and log labor income, y t , should be cointegrated. LL focus on the case of a single cointegrating relationship they call "cay", c t a a t y y t . That is, the cointegrating vector should be (1; a ; y ) 0 and there should be no deterministic trend in the cointegrating residual (although there may be an intercept). As LL show, this implies that there are two permanent shocks and one transitory shock driving the joint ‡uctuations in consumption, labor income and wealth. Their accounting argument does not rule out the case of two cointegrating relationships, with the implication of one permanent and two transitory shocks and once again no deterministic trends in the cointegrating residuals. Note that in this latter case "cay"is still stationary. In theory, the budget constraint should also imply a + y = 1 but, given that only a fraction of total consumption based on nondurables and services is observable, we should get 0 a + y 1.
An important methodological issue is how to include the theoretical information outlined in the previous paragraph in the statistical model. From a Bayesian point of view, this can be thought of as prior information, either about the model space (e.g. the model with one cointegrating vector involving no deterministic trend in the cointegrating residual is more plausible a priori ) or about the parameter space (e.g. the region 0 a + y 1 is more plausible a priori ). In this paper, we use a Bayesian approach to re-examine the conclusions of LL about the consumption-wealth relationship. In this re-examination, we view Bayesian methods as a tool that allows us to investigate the relative roles of economic theory, the data and the model selection procedure in empirical conclusions. In particular, in the re-examination we conduct, there are substantial economic theory reasons to adopt a number (but not all) of the model speci…cation choices used by LL. What we wish to do is separate these a priori beliefs from the information in the observed data and their data-driven model speci…cation choices.
This is not because we believe that strong theory-driven prior beliefs in favor of certain models are necessarily 3 bad. On the contrary we believe that such prior information is a necessary but not su¢ cient condition for obtaining robust empirical conclusions. However, it is important to understand the relative contributions of a priori beliefs, data-driven modeling choices and the data itself.
These distinctions can be made clear by describing our class of econometric models. This class is a standard one as outlined in, e.g., Johansen (1995). If we let x t be an n 1 vector (i.e. in our case n = 3 and x t = (c t ; a t ; y t ) 0 ), then a Vector Error Correction model (VECM) can be written as: where and are n r matrices with 0 r n being the number of cointegrating relationships. 4 (L) is a matrix polynomial of degree p in the lag operator and d t is the deterministic term (to be de…ned shortly). The framework described in (2.1) de…nes a set of models which di¤er in the number of cointegrating vectors (r), lag length (p) and the speci…cation of deterministic terms. With regards to the latter, a deterministic trend in the cointegrating residuals ( 0 x t 1 ) has very di¤erent implications than a deterministic trend in the levels of the series. Accordingly, following Johansen (1995, Section 5.7), we decompose into these two di¤erent parts as: where 1 = ( 0 ) 1 0 and 1 = ( 0 ? ? ) 1 0 ? (and ? is orthogonal to ). With this transformation, 1 re ‡ects deterministic terms in the cointegrating residual while 2 re ‡ects those in x t . Within this framework we consider possibilities for the deterministic term from the set d t = (1; t), d t = (1) and d t = (0) (i.e. no deterministic terms). Note that we do not rule out including a linear trend in the cointegrating residual as this may be reasonable in a …nite sample if the consumption-wealth ratio is changing over time and, indeed, support for such a speci…cation is found by Lee (2002). Accordingly we consider the …ve possibilities de…ned in the following table. 5 Limiting the number of variables a¤ected by the cointegrating residuals (in econometric language these are referred to as weak exogeneity restrictions) is another issue that is commonly considered in this literature. In some of their results, LL impose weak exogeneity of c t and y t in the cointegrating relationship. That is, they set the coe¢ cients in corresponding to the equations for c t and y t to be zero. They argue, following Gonzalo and Ng (2001), that this will allow for more stable estimates of the permanent-transitory decomposition. The instability is produced by the sensitivity of ? to estimation error in : We use an alternative technique, due to Centoni and Cubadda (2003) that does not require the construction of ? to measure the share of permanent shocks in the variance decomposition. 6 This form of weak exogeneity has the important policy implication that transitory shocks have no immediate e¤ect on c t and y t and, at longer horizons, the e¤ect of transitory shocks enters through the asset channel's a¤ect on consumption and labor income. For example, if this form of weak exogeneity is imposed and the lags of wealth growth in the consumption and labor income equations are all close to zero, then nearly all of the adjustment to the transitory shock will be in wealth.
We have now de…ned a set of models (i.e. a model space) which vary in their number of cointegrating vectors (r), lag length (p), deterministic terms (d) and whether or not weak exogeneity is imposed. Economic theory has strong implications for some aspects of model choice, but not others. In particular, it says that we should have r = 1 or 2 and the cointegrating residual should not have a trend in it (i.e. d > 2). However, economic theory says nothing about the deterministic term in x t , nor about lag length nor about weak exogeneity. One can imagine three di¤erent reactions to this state of a¤airs. Firstly, one could simply argue that theory should be ignored and we should let the data speak. This argues for the consideration of all possible models and a non-informative prior over model space. Secondly, one could argue that theory should be imposed dogmatically and, thus, only models with r = 1 or 2 and d > 2 should be considered. This argues for a non-informative prior over the set of models consistent with theory. Thirdly, one could argue that all models should be considered, but that more weight should be attached to models which are consistent with theory. This argues for an informative (but not dogmatic) prior over model space. In our empirical work, we discuss all three of these strategies.
Our econometric methods, which are largely the same as those developed in Strachan (2003), Strachan and Inder (2004) and Strachan and van Dijk (2003), are described in the Appendix. Here we sketch the basic ideas. The key elements we require are a posterior for the parameters and a posterior model probability for each model. In terms of (1.1), to obtain posterior properties of a function of interest (e.g. a variance decomposition), we need p ( jData) which depends on p (M r jData) and p ( jData; M r ). Properties of the latter two densities can be obtained from the posterior simulators for each model described in the Appendix.
p (M r jData) and p ( jData; M r ) depend on the likelihood function (de…ned by equation 2.1, assuming errors are Normally distributed), a prior over the parameter space and a prior over model space. The prior over model space will be discussed in the next section. With regards to the prior over parameter space we adopt three di¤erent approaches. First, we use a standard noninformative prior but use the Schwarz criterion (BIC) to approximate p (M r jData). 7 Secondly, we use a standard noninformative prior and the fractional Bayes factor approach of O'Hagan (1995). Thirdly, we use an informative shrinkage prior similar in spirit to the commonlyused shrinkage prior of Doan, Litterman and Sims (1984). However, in order to be as objective as possible, we treat the shrinkage parameter as an unknown parameter updated in a data-based fashion. 8 For the prior over the cointegration space, we choose the reference prior described in Strachan and Inder (2004). 9 This prior is Uniform over cointegration space and, thus, noninformative in the sense described in Strachan and Inder (2004).
In addition, we present results averaged over the three priors. Such results can be motivated by noting that a Bayesian model involves both a likelihood and a prior. Interpreted in this way, our empirical work involves a huge set of models de…ned not only by likelihood assumptions (e.g. number of cointegrating vectors), but also by prior assumptions and it makes sense to present a grand average over all such models. 10 Thus, we are using a 7 The ratio of BIC's for two models is approximately equal to the log of the Bayes factor comparing the two models. This relationship can be used to approximate p (MrjData) : 8 In Bayesian jargon, we use a hierarchical prior for the shrinkage parameter. 9 A reference prior is one which is can be used automatically, not requiring subjective prior input from the researcher. For the purposes of model comparison, care must be taken in designing a reference prior to ensure that all models receive equal treatment. In the Bayesian cointegration literature, there has been a long discussion as to what makes a good reference prior. This is not the place to summarize the issues raised in this debate. Su¢ ce it to note that the approach used in this paper surmounts many problems of earlier approaches and shares many similarities with other popular reference priors such as those developed in Kleibergen and Paap (2002) and Villani (2004). wide variety of objective and reference Bayesian approaches to model comparison. We argue such an approach is crucial in order to investigate the prior robustness of any empirical results. Precise details are provided in the Appendix.

Empirical Results
Our empirical results are based on the same data as used in LL and the reader is referred there for precise details. Brie ‡y, our data run from 1951Q4 through 2003Q1 11 and contains data on c t which is the log of real per capita expenditures on nondurables and services (excluding shoes and clothing); a t which is the log of a measure of real per capita household net worth (including all …nancial and household wealth as well as consumer durables); and y t which is the log of after-tax labor income.

Model Comparison Results
In this section, we present results relating to the posterior model probabilities, p (M r jData), for our 80 di¤erent models (or 3 80 if our three di¤erent classes of prior are interpreted as de…ning di¤erent model classes). As described in the previous section, models are de…ned by the number of cointegrating vectors (r = 0; 1; 2; 3), the number of lags (p = 0; 1; 2; 3) and the treatment of deterministic terms as outlined in Table 1 (d = 1; :::; 5). 12 For comparison, remember that LL chose r = 1, d = 3 and p = 1. As an aside, it is worth noting that, in their Table B.1, LL select p = 1 using BIC (and their variance decompositions are based on p = 1 and r = 1), but they mention that AIC selects p = 0. Most of their statistical tests indicate r = 1. However, for p = 1, the Johansen L-max and trace tests in their Table B.1 both select r = 0. Hence, even in LL's non-Bayesian procedure there does seem to be model uncertainty over both the number of cointegrating vectors and lag length. 13 Given the large number of models and three di¤erent approaches to model comparison (i.e. based on BIC, fractional Bayes factors and the shrinkage prior), we do not present results for every model. Instead, the various sub-panels of Table 2 present results relating to the number of cointegrating vectors (integrating over deterministic terms and lag length), lag length (integrating over r and d) and deterministic trends (integrating over r and p) for our three di¤erent priors. In this table, we do not attach any extra weight to models consistent with economic theory, but simply allocate equal prior weight to each model.
With regards to the shrinkage prior, as described in the Appendix, this requires the choice of a shrinkage parameter which we call . Exact details are given in the Appendix, but note here that we use a relatively di¤use hierarchical prior for this parameter (i.e. we are averaging over di¤erent values of where the weights in the averaging are data-based).  With regards to cointegrating rank, BIC o¤ers fairly strong support for the LL choice of r = 1. However, even BIC indicates an 11:9% chance that cointegration does not occur. Results using our other two priors are even more interesting, with under 30% chance of a single cointegrating vector, but appreciable weight allocated to two cointegrating vectors. The fractional Bayes factor approach even indicates appreciable support for r = 3.
This latter …nding, along with the appreciable support for d = 2, indicates support for the series being stationary with linear deterministic trends in their levels.
With regards to the issue of deterministic trends, there is also a high degree of uncertainty. Certainly, all of our approaches indicate substantial support for d = 3 or 4 (the choice of LL and a closely related choice), but the BIC and fractional Bayes factor approaches attach substantial weight to the (economically non-intuitive) cases where there is a time trend in the cointegrating residual.
As we will discuss in more detail below, LL's …nding of one cointegrating vector is crucial to their transitorypermanent decomposition which underlies many of their key empirical results. If all series have unit roots, but cointegration does not occur (i.e. r = 0), then all shocks are permanent and their …nding that transitory shocks to wealth are very important cannot be recovered. In contrast, if r = 3 then all shocks are, by de…nition, temporary. Furthermore, we expect that models with r = 1 should yield results similar to those found by LL, but there is no reason to think r = 2 will necessary do so. When we do our Bayesian Model Averaging (BMA) exercise in the following section, we are averaging over all these choices so a wide variety of …ndings are possible.
The previous results are all based on a non-informative prior over model space. That is, each of our 80 models received an a priori weight of 1 80 . Given that economic theory suggests that cointegration is present (r = 1 or 2) and d > 2 (i.e. theory suggest that there is no linear trend in the consumption-wealth ratio as, in the long run, such a trend would imply values of this ratio outside the interval [0; 1]), the researcher may wish to attach more prior weight to these models. For the sake of brevity we do not present results for this case, but stress that such an approach is possible. For instance, if the researcher thought that models consistent with economic theory should receive twice the weight of other models, then she should attach prior weight of 2 104 to the 24 models consistent with economic theory and 1 104 to the 56 models which are not. Results in Table 2 could be adjusted appropriately.
A third strategy a researcher might take is to impose economic theory on the model and work only with the 9 set of models that are consistent with it. If we do this, by attaching equal prior weight to all models with r = 1 or 2 and d > 2 (but are agnostic over lag length) and omit all other models, we obtain the results in Table 3.
These results are more consistent with the …ndings of LL. Note, however, that with the exception of the BIC, there still is substantial probability attached to two cointegrating vectors and deterministic terms di¤erent from the ones used by LL. We will see whether these di¤erences have substantial economic implications in the next section on variance decompositions.  When doing variance decompositions, researchers sometimes impose weak exogeneity restrictions to improve estimation accuracy. In the present application, the weak exogeneity assumption of most interest (which is imposed by LL in some of their results) is that the coe¢ cients in corresponding to the equations for c t and y t to be zero. This hypothesis only makes sense when cointegration occurs and r = 1. When r = 0 these coe¢ cients do not enter the model and when r = 2 then cointegration theory implies that at most 1 variable can be weakly exogenous. Accordingly, we note that the probability of this weak exogeneity restriction, conditional on r = 1, is 0:398; 0:641 and 1:000 using the BIC, fractional Bayes factor and shrinkage prior approaches, respectively.
Thus, there is some uncertainty about whether this is a reasonable restriction to impose.
Another hypothesis of interest is whether 0 a + y 1. This hypothesis only makes sense when cointegration occurs with r = 1. As an example of our …ndings for this relationship, we …nd p (0 a + y 1jData; r = 1) to be 0:614 using the shrinkage prior approach. Thus, there is substantial uncertainty over whether this implication of economic theory holds.

Variance Decompositions
We have established above that there is a great deal of uncertainty about which model is appropriate for this data set. Of course, if all of the plausible models have similar implications for quantities of interest to economists, then the fact that model uncertainty is substantial would be unimportant. Accordingly, in this section we investigate the consequences of model uncertainty for functions of interest to the economic policymakers. For the sake of brevity, we focus on variance decompositions. LL present a wider battery of results, but their variance decompositions are at the heart of their story. Brie ‡y, cointegration restrictions allow us to decompose x t into permanent and transitory components. We can then measure the role each of these plays in the model.
A common way of doing this is to calculate the fraction of the total variance of the forecast error at horizon h which is attributable to the permanent and transitory components. LL provide more discussion. Note that, with three variables, we have three innovations driving the model (e.g. with r = 1, two of the innovations are permanent and one transitory). Following LL, we combine permanent shocks together (or transitory shocks together if r = 2 ).
The key …nding of LL is that transitory shocks dominate changes in wealth. Hence, although we will mention a range of …ndings, our main focus will be on this relationship. Furthermore, since our variance decompositions are measured as fractions of the total forecast error variance, the results for permanent and transitory shocks will sum to one. Accordingly, we only present variance decompositions for the permanent component. Finally, empirical results are qualitatively similar at all forecast horizons. Hence, we only present results for h = 1. Table 4 summarizes the posterior properties of the variance decomposition. We present the posterior mean and median (two commonly-used point estimates) as well as two measures of the dispersion of the posterior.
These are a 50% Highest Posterior Density Interval (HPDI) which is de…ned as the shortest interval containing 50% of the posterior probability and an interquartile range (IQR): the 25th and 75th percentiles of the posterior. 14 At …rst glance, the reader may …nd the numbers in this table confusing. After all, for well-behaved distributions like the Normal, the mean and the median are the same and the 50% HPDI and the interquartile range should be as well. Clearly they are not. In fact, the HPDIs are often discontinuous, made up of two or more disjoint intervals. This property is due to the fact that BMA involves averaging di¤erent distributions 1 4 We calculate these by creating a histogram de…ned over a grid and then using its properties to derive HPDIs and IQRs.

11
-the result can be a highly multimodal distribution. Note also that the model with r = 0 implies all shocks are permanent. Hence, the permanent error's share of the forecast error is one by de…nition. This will cause a spike in the posterior distribution at 1. If models with r = 0 receive appreciable weight, this point can appear in HPDIs. So, for instance, the 50% HPDI interval for wealth using the BIC prior contains two intervals: one close to zero and one close to one. If one were to use this HPDI, one could conclude that permanent shocks either play a negligible role in explaining ‡uctuations in wealth or they play the dominant role! Interestingly, it is the Shrinkage prior approach, which is the most "subjective" one we adopt (in the sense that we need to select a prior for the shrinkage parameter), which yields the most regular and well-behaved results.
A key conclusion from Table 4 is that it is hard to make a de…nitive conclusion about the relative role of transitory and permanent shocks in driving ‡uctuations in wealth (or any of the variables). We are presenting intervals which contain 50% of the posterior probability (rather than more usual 95%). But even these are very wide, indicating that the data is not very informative about the relative roles of permanent and transitory shocks. A related conclusion is that presenting point estimates such as posterior means can be very misleading.
Indeed there are many cases where the posterior mean is not even included in the HPDI! The point estimates in Table 4 are not that dissimilar to those presented in LL (see their Table 2). In general, these indicate (as found in LL) that permanent shocks have a relatively small role to play in driving ‡uctuations in wealth, but have a larger role in driving ‡uctuations in income and consumption. However, given the multimodal nature of the posterior and the fact that appreciable probability is allocated to regions inconsistent with this story, it is risky indeed to place much store in point estimates.
This …nding can be seen more clearly in Figure 1 which, for the wealth variable, plots the entire posterior distributions which were used to create Table 4 there is appreciable support for the view that transitory shocks dominate ‡uctuations in wealth, this support is far from being overwhelming.
For the consumption and income variables, the posteriors also exhibit substantial dispersion and multimodality. However, with the exception of the fractional Bayes Factor approach, there is more of a consensus that it is the permanent shocks that are playing the predominant role. Such a …nding is consistent with LL.   Figure 2, analogous to Figure 1, plots the posterior of a particular variance decomposition of interest. It can be seen that imposing the restrictions of economic theory only slightly alters the posterior distributions of the variance decompositions. Even when we are ruling out r = 0 or 3, we still …nd that models with r = 1 or 2 still attach appreciable support to regions of the parameter space which are inconsistent with LL's story. For instance, they allow for substantial probability that it is permanent shocks to wealth which are playing the dominant role. A priori, one might have expected that the inclusion/exclusion of models with r = 0 (which imply all shocks are permanent) would be crucial. Table 5 indicates that this is not so. In fact, using the fractional Bayes factor approach, imposing economic theory actually implies slightly less support for the "transitory shocks are predominant in driving wealth" story (e.g. the probability in the interval [0; 0:5] drops from 0:739 to 0:642 for this case). This arises since we are also ruling out r = 3 (which implies all shocks are transitory), To provide evidence that the con ‡ict between our results and LL's is due to our treatment of model uncertainty, and not due to the use of Bayesian methods within a given model, Table 6 presents results for the model used by LL with r = 1; d = 3 and p = 1 . We average across models with and without the weak exogeneity restriction imposed. For r = 1; d = 3 and p = 1, the probability of weak exogeneity holding is quite high for all approaches (i.e. 0:981; 0:789 and 1:000 for our three approaches). Thus, results in Table 6 are quite close to those we found using the single model r = 1; d = 3 and p = 1 with weak exogeneity imposed (i.e. the model which LL favor).
For the sake of brevity, we only present results for the wealth variable. For comparison, LL present (see their  Table 6 indicates a very slightly lower variance decomposition, but overall our results are quite similar. In short, in the context of this model, we are recovering LL's …ndings that only a very small part of the ‡uctuations in wealth are driven by permanent innovations and that this result is fairly precisely estimated. A deeper investigation into exactly which of LL's speci…cation choices are crucial is revealed in Figure 3. This …gure plots, for the wealth variable, the posteriors of the share of the forecast error variance due to the permanent component for LL's model ( r = 1; d = 3 and p = 1) without exogeneity imposed and averaged over the models with and without exogeneity. For brevity, we only plot the posterior averaged over our three approaches. The posterior which averages over models with and without exogeneity is well-behaved (this is the posterior which underlies the results in Table 6). However, when we do not impose weak exogeneity, then the posterior alters dramatically. In particular, we now observe a much ‡atter posterior which allocates more weight near the point most inconsistent with LL's story. Clearly imposition of weak exogeneity is having a huge e¤ect on the posterior. 15 Before completing the empirical work, we had thought that r would be the key model aspect that was important for results. After all, models with r = 0 imply that all shocks are permanent. Surely this aspect could account for why graphs like Figure 1 had a spike of probability near 1. However, although choice of cointegrating rank is important, it turns out that the imposition of weak exogeneity is even more important.
After all, Figure 2 rules out the case r = 0 and it still exhibits many of the properties of Figure 1. Even the single model of LL, without weak exogeneity imposed, exhibits some such properties. It is only when we set r = 1 and impose weak exogeneity do we obtain results that are fully consistent with the story that permanent shocks have little role to play in driving wealth. An examination of the formulae underlying the permanent/transitory split and variance decomposition o¤ers a possible explanation as to why weak exogeneity might be so crucial. The variance decompositions are complicated nonlinear transformations of the VECM parameters. In particular, the parameters relating to weak exogeneity (i.e. ) often appear in the denominators of fractions determining the long run e¤ects of shocks. If these are imprecisely estimated, then they can have a substantial impact on the variance decompositions. Potter (2001) o¤ers a detailed discussion of this and related points.
It would be desirable to consider every one of our models with and without this weak exogeneity restriction imposed. However, as discussed in the previous section, the particular weak exogeneity restriction used by LL is only possible in models with r = 1. So a strategy of simply doubling the number of models under consideration (with/without weak exogeneity imposed) is not possible. It would be possible to work with our existing set of models plus an additional 20 models with r = 1 with weak exogeneity imposed. If we do this, our results are pulled more towards those of LL. However, since appreciable posterior weight is attached to models without LL's weak exogeneity restrictions imposed (i.e. for r = 0; 2; 3), we still obtain posteriors of our variance decompositions which are quite dispersed, often multi-modal and attach at least some weight at boundaries. In short, it is hard to see how we can precisely estimate variance decompositions in this data set without making strong assumptions that are not fully supported by the data.
As one …nal piece of evidence relating to the importance of weak exogeneity, we obtained variance decompositions averaged over the set of models consistent with economic theory (i.e. d > 2 and r = 1 and 2) with weak exogeneity restrictions imposed throughout. For models with r = 1 we impose the same weak exogeneity restriction as above. For r = 2 we impose a weak exogeneity restriction where only consumption is weakly exogenous in the cointegrating relationship. This can be interpreted as imposing a form of the permanent income hypothesis and their is appreciable support for this hypothesis in the data set (e.g. for models with r = 2 and d = 3 the probability of this restriction holding varies from 0:674 to 1:000 in our various approaches and lag lengths). Thus, for r = 1 and 2, we are imposing di¤erent weak exogeneity restrictions (which some econometricians may object to). When we do this, we recover results quite similar to LL's. There are many other features that an empirical researcher may be interested in (e.g. other sorts of variance decompositions, long-horizon regressions, etc.). 16 However, for the sake of brevity, we end our empirical results here, having established our basic point: that model uncertainty is an important issue in this data set and that empirical conclusions depend on the way this issue is treated.

Conclusions
In this paper we have re-examined evidence relating to the consumption-wealth relationship using a variety of di¤erent approaches to Bayesian model averaging. We document a great deal of uncertainty with regards to the number of cointegrating vectors, lag length, form of deterministic trends in a commonly-used data set. As a result of this, we …nd that the exact magnitude of the role of permanent shocks is hard to estimate precisely.
Thus, although some support exists for the view that their role is small, we cannot rule out the possibility that they have a substantive role to play. In particular, the posterior distributions of key variance decompositions are found to be multimodal and relatively non-informative. It is only if we work with the single model used by LL that we obtain results very similar to those in LL. Within that single model the restriction that the cointegrating residual only enters the wealth equation used by LL turns out to be of great importance in producing the result that most ‡uctuations in wealth are transitory.
In addition to making a contribution to the empirical literature on the wealth e¤ect, we have presented a broader methodological argument that the conclusions of any empirical exercise should depend on economic theory, the data and the treatment of model uncertainty. We view Bayesian methods as a tool that allows us to investigate the relative roles of these three aspects. That is, Bayesian methods allow us to separate a priori beliefs about economic theory from the information in the observed data and their data-driven model speci…cation choices.

References
Brennan, M. and Xia, Y. (2004) "Tay's as good as cay," Finance Research Letters, forthcoming. Strachan, R. W. and Inder, B. (2004). "Bayesian analysis of the error correction model,"Journal of Econometrics, forthcoming. Strachan, R. W. and van Dijk, H. K. (2004). "The value of structural information in the VAR model," Econometric Institute Report EI 2004-23, Erasmus University Rotterdam. Villani, M. (2004. "Bayes reference analysis of cointegration," Econometric Theory, forthcoming. We have summed over all models in the denominator in (A.1). In this paper, we estimate the marginal likelihoods using three methods. First, we adopt the fractional Bayes factor approach of O'Hagan (1995). Second, the Bayesian information criteria (BIC) of Schwarz (1978). Finally, we use Markov Chain Monte Carlo (MCMC) in the context of a certain reference prior described below. Here we provide a brief description of the …rst two which provide asymptotic approximations to m r (y) without requiring speci…cation of a prior distribution for the parameters. The remainder of the appendix outlines the third approach.
The fractional Bayes factor is a particular form of partial Bayes factor proposed by O'Hagan (1995). As a means of obtaining estimates of Bayes factors with weak or improper priors, the partial Bayes factor uses a fraction of the sample -or a training sample -to produce a posterior which is then used as a prior for the remainder of the sample. One arbitrary feature of this method is the choice of particular subsample to use as As a closed form expression (conditional upon ) exists for the outer integral with respect to ( ; ) ; we can perform this integral analytically. It is the inner integral with respect to for which we have no closed form expression and so resort to MCMC methods. We use the approach proposed in Strachan and Inder (2004) and Strachan and van Dijk (2004) in which a Uniform prior is placed upon the (cointegrating) space of and is speci…ed as a semiorthogonal matrix such that 0 = I r . This speci…cation ensures the posterior will be proper for all models considered and a simple, e¢ cient sampling scheme can be speci…ed to draw : Further details are provided below and in the survey paper Koop, Strachan, van Dijk and Villani (2004).

The Prior
We use the following commonly used noninformative prior for This is an improper prior but will cause no problem for computation of the Bayes factor as this same prior is employed for all models. Kass and Raftery (1995)  Je¤reys'prior, it is that part of Je¤reys'prior related to and widely accepted as a non-informative prior for

:
We next specify a Normal prior for as N (0; I ) where is a hyperparameter in the variance which controls how di¤use is the prior. It is this hyperparameter which motivates our terminology "shrinkage" prior.
As we did not wish to select a particular value for but wanted to keep the prior for di¤use, we gave a Gamma distribution g (n 1 ; n 2 ) in which moderately di¤use values for n 1 and n 2 are chosen (n 1 = 3, n 2 = 4).
This prior places 99% of the prior mass for in the interval (0.45, 19.9), ensuring the prior for will cover a wide range of plausible values.
The prior for is implied by a Uniform prior over the cointegration space: As is frequently stated in empirical cointegration studies, the elements of are not identi…ed and so restrictions need to be imposed to permit their estimation. The only information we can obtain from likelihood based analysis of a VECM is information on the location of the cointegrating space, or the space spanned by : As argued in Villani (2004) and Strachan and Inder (2004), this is not a limitation since it is the cointegrating space that is the object of interest and estimates of simply provide an estimate of this space. We wish to be non-informative about the cointegrating space and so follow the approach in Strachan and Inder (2004) by specifying to be semiorthogonal such that 0 = I r so has a compact support, and place a Uniform prior on this support. Alternative speci…cations may be used to express ignorance about the cointegrating space, but this is the only one proposed to date that ensures we can obtain proper posterior distributions for all of the models we consider.

Posterior Computation
The prior speci…cation above is also used in Strachan and van Dijk (2004) for a range of model averaging exercises. They show that the marginal density for ; d; p and r is p ( ; d; p; rjy) _ g d;p;r k ( ) where g d;p;r is a function of the data and d; p and r and where D 0 and D 1 are data-based quantities [see Strachan and van Dijk (2004) for precise de…nitions]. The integral m = R k ( ) d does not have a closed form, therefore to obtain the marginal likelihood for a model we use MCMC methods to calculate m. To do this, we require both a series of draws of from its posterior distribution, p ( jd; p; r; y) ; and a method of using these draws to estimate m: We use a random walk Metropolis-Hastings algorithm to obtain 80; 000 draws of from p ( jd; p; r; y) (after a burn-in of 10; 000 draws) in which the candidate distribution is the matrix angular central Gaussian distribution, G ( ), with density g ( ) : This distribution was …rst proposed by Chikuse (1990) and the derivation of the speci…c form used in this paper is the same as that given in Strachan and Inder (2004). To obtain a draw from this candidate involves the following steps. Assume we have a current draw of and therefore the n (n r) matrix lying in the orthogonal complement of ; ? : Denote these draws respectively as (i) and (i) ? : Then iterate over the following steps.
: Draw an n r matrix Z = fz ij g from the Multivariate Standard Normal such that each z ij s N (0; 1) and independent of all other z ij : Construct X = P Z: Decompose X into X = where 0 = I r and is full rank upper triangular.
The matrix is a candidate draw from G ( ) which is 'located'at the previous value (i) : Accept and set (i+1) = with probability min p( )g( (i) ) p( (i) )g( ) ; 1 , else set (i+1) = (i) : The only parameter value to be decided for g ( ) is : Suggestions for how to choose this value are given in Strachan and Inder (2004). We use = 0:6 as we …nd this keeps the tails thin but still allows reasonable dispersion.
To estimate m we take the approach proposed by Gelfand and Dey (1994) and use a proper distribution on the support of ; draws from the posterior for and the posterior kernel k ( ) : If g ( ) is a proper density, we may obtain the estimate of m from 1 m = Z g ( ) k ( ) k ( ) m d : As we have a sequence of draws (i) ; i = 1; :::; J; from the distribution with density k ( ) =m, we can estimate Our choice of g ( ) ; as the notation suggests, is again the matrix angular central Gaussian distribution, G ( ), with density g ( ) : However, the location is …xed at the value of (i) that gave the highest value of g ( ) over the burn-in sample. All computations were performed using Gauss 3.5.
A di¤erent approach was taken to estimate the posterior probabilities for the models with weak exogeneity imposed. We use the draws (i) from the distribution with kernel k ( ) for the model without weak exogeneity imposed and denote the kernel for the same model with weak exogeneity imposed as k x ( ). The marginal likelihoods for these models are respectively m = R k ( ) d and m x = R k x ( ) d and the Bayes factor can then be estimated as