TIME-VARYING COEFFICIENT MODELS: A PROPOSAL FOR SELECTING THE COEFFICIENT DRIVER SETS

Coefficient drivers are observable variables that feed into time-varying coefficients (TVCs) and explain at least part of their movement. To implement the TVC approach, the drivers are split into two subsets, one of which is correlated with the bias-free coefficient that we want to estimate and the other with the misspecification in the model. This split, however, can appear to be arbitrary. We provide a way of splitting the drivers that takes account of any nonlinearity that may be present in the data, with the aim of removing the arbitrary element in driver selection. We also provide an example of the practical use of our method by applying it to modeling the effect of ratings on sovereign-bond spreads.


INTRODUCTION
Time-varying-coefficient (TVC) estimation is a way of estimating parameters of a model even when (i) the true functional form is unknown, (ii) there are missing important regressors, and (iii) the included variables contain measurement errors. 1 Several successful applications of this technique have appeared in the recent This paper was presented at the third ISCEF (Paris, April 10-12, 2014, www.iscef.com). The views expressed in this paper are the authors' own and do not necessarily represent those of their respective institutions. Address correspondence to: George S. Tavlas literature, including Hall et al. (2008Hall et al. ( , 2009aHall et al. ( , 2009bHall et al. ( , 2012Hall et al. ( , 2013Hall et al. ( , 2015 and Swamy et al. (2015). However, an important assumption is needed to make the technique operational. This assumption concerns the choice of what are called "coefficient drivers" (formally defined later) and the separation of these drivers into subsets. This separation allows us to derive estimates of bias-free coefficients. Intuitively, coefficient drivers are a set of variables that feed into the TVCs and explain at least part of their movement. As explained in what follows, the set of drivers is split into two subsets, one of which is correlated with the bias-free coefficient that we want to estimate and the other with the misspecification in the model. This split can appear to be somewhat arbitrary [much as in the case of choosing instrumental variables) and has been a source of criticism of the TVC approach-see Hall et al. (2013)]. The problem here is that different driver sets will give different results and there is very little guidance as to which set of drivers should be preferred, much as is true for instrumental variable estimation. The objective of this paper is to put forward a method for producing this split that takes account of the nonlinearity that may be present in the original data. As we argue in the following, this method provides a natural split in the driver set. The remainder of this paper is divided into four sections. Section 2 presents a summary of the TVC approach and formally defines the concept of coefficient drivers and the need to split the drivers into two sets. Section 3 then proposes a method for determining this split. Section 4 provides an example of the practical use of the technique by applying it to modeling the effect of rating agencies ratings on sovereign bond spreads. Section 5 concludes.

TVC ESTIMATION
Here, we summarize the approach to TVC estimation that has been formalized in Swamy et al. (2010). TVC estimation proceeds from an important theorem that was first established by Swamy and Mehta (1975) and that has subsequently been confirmed by Granger (2008). This theorem states that any nonlinear functional form can be exactly represented by a model that is linear in variables, but that has time-varying coefficients. The implication of this result is that, even if we do not know the correct functional form of a relationship, we can always represent this relationship as a TVC relationship and thus estimate it. Hence, any nonlinear relationship may be stated as where x 1t , ..., x K−1,t are K-1 observed determinants of y t . Consequently, this theorem leads to the result that, if we have the complete set of relevant variables with no measurement error, then by estimating a TVC model we will get reasonable estimates of the true partial derivatives of the dependent variable with respect to each of the independent variables given the unknown nonlinear functional form. Another theorem that is used to derive (1) should also be stated.

THEOREM.
A necessary condition for a model to be true is that its coefficients and error term, having the correct functional forms, are unique.
To explain this theorem, we need to introduce some appropriate terminology. The term "true model" is taken to mean the original real-world relationship underlying (1), which does not permit spurious relations. 2 A compressed real-world relationship is taken to mean the original real-world relationship compressed to have fewer regressors than the original relationship. The included regressors are those that are included in both original and compressed relationships. Omitted regressors are those that are included in the original relationship but not in the compressed relationship. No relevant regressors, including relevant preexisting conditions, are omitted from the original relationship, and hence it does not have an error term. The coefficients and error term of the compressed real-world relationship are unique if they are invariant under the changes made in the real-world relationships between each omitted regressor and the included regressors such that the equality signs in these relationships are unchanged; the unique coefficients on nonconstant regressors have the form of the sums of partial derivatives of the true value of the dependent variable with respect to the true values of included regressors and the corresponding omitted-regressors bias; the unique error term has the form of a function (with the correct functional form) of certain "sufficient sets" of omitted regressors; the coefficients cannot be unique unless they have the correct functional forms. These explanations of the theorem and its proof are given in Swamy et al. (2014). If we allow for the fact that we do not know the full set of independent variables and that some, or perhaps all, of them may be measured with error, then the TVCs become biased (for the usual reasons). What we would like to have is some way to decompose the full set of biased TVCs into two parts-the biased component and the remaining part; the latter would be a bias-free true component. Although this is asking a great deal of an estimation technique, it is precisely what TVC estimation aims to provide [see Swamy et al. (2010)]. This technique builds from the Swamy and Mehta theorem, mentioned earlier, to produce such a decomposition. 3 Swamy et al. (2010) show what happens to the TVCs as other forms of misspecification are added to the model. If we compress a real-world relationship, then the true TVCs get contaminated by a term that involves the relationship between the omitted and included regressors. If we also allow for measurement error, then the TVCs become further contaminated by a term that allows for the relationship between the included regressors and their measurement errors. Thus, as one might expect, the estimated TVCs are no longer the true partial derivatives of the nonlinear function. Instead, they are biased because of the effects of omitted regressors and measurement error. There is no estimation so far. There are exact mathematical proofs for our statements up to this point.
To make TVC estimation fully operational, we need to make two key parametric assumptions. First, we assume that the time-varying coefficients themselves are determined by a set of stochastic linear equations, which makes them a function of a set of variables we call driver (or coefficient-driver) variables. This is a relatively uncontroversial assumption. The only restrictive assumption is that the relationship is linear; this may be viewed as a good first approximation. It would be possible to generalize this assumption to any other nonlinear parametric function without great difficulty. Second, we assume that some of these drivers are correlated with the misspecification in the model and some of them are correlated with the time variation coming from the nonlinear (true) functional form. Having made this assumption, we can then simply remove the bias from the time-varying coefficients by removing the effect of the set of coefficient drivers that are correlated with the misspecification. This procedure, then, yields a reasonable set of estimates of the true partial derivatives of the unknown compressed real-world nonlinear relationship, which may then be tested by constructing t tests in the usual way. An important difference between coefficient drivers and instrumental variables is that for a valid instrument we require variables that are uncorrelated with the misspecification. Such variables often prove hard to find. For a valid driver, we need variables that are correlated with the misspecification. We would argue that this is much easier to achieve than finding instruments that are uncorrelated with the misspecification.
To formalize the idea of the coefficient drivers, we assume that each of the TVCs in (1) is generated in the following way.
Assumption 1 (Auxiliary Information). Each coefficient is linearly related to certain observable drivers plus a random error, γ jt = π j 0 z 0t + p−1 d=1 π jd z dt + ε jt (j = 0, 1, . . . , K − 1), where the π s are fixed parameters, the z dt are what we call the observed coefficient drivers, and z 0t = 1. Different coefficients of (1) can be functions of different sets of coefficient drivers.
Assumption 2. For all t, E[ε t = (ε 0t , ε 1t , . . . , ε K−1,t ) |z t = (1, z 1t , . . . , z p−1, t )] = 0. The regressors and the coefficients of (1) are conditionally independent of each other given the coefficient drivers. 4 These coefficient drivers alone are merely a set of variables that, to a reasonable extent, jointly explain the movement in γ jt . It should be noted that there are two sources of variation in y t . The regressors and the coefficients of (1) jointly explain the movement in y t . Under our method, the coefficient drivers included in equation (2) have two uses. Insertion of equation (2) into equation (1) parameterizes the latter equation. This is the first use of the coefficient drivers. Here, the issue of identification of the parameterized model (1) is important. 5 The other important use of the drivers allows us to separate the bias and bias-free components of the coefficients. Assumption 3. The set of coefficient drivers and the constant term in equation (2) divide into three different subsets, A 1j , A 2j , and A 3j such that the first set is correlated with any variation in the true parameter that is due to the underlying relationship being nonlinear, the second set is correlated with bias in the parameter coming from any omitted regressors, and the final set is correlated with bias coming from measurement error.
This assumption allows us to identify separately the bias-free, omitted-variables, and measurement-error bias components of the coefficients of equation (1). Assumption 3 is the key to making our procedure operational; it is the assumption that we can associate the various forms of specification biases with setsA 2j andA 3j , which means that set A 1j simply explains the time variation in the coefficients caused by the nonlinearity in the true functional form of the compressed realworld relationship underlying (1). If this relationship is linear, then all that will be required for set A 1j is to contain the constant z 0t of (2). If this relationship is nonlinear, then the bias-free components should be time-varying and the set of drivers belonging to A 1j will explain the time variation in these components. There are essentially two sets of variables here-the A 1j , which is associated with the true nonlinearity in (1), and the A 2j and A 3j , which are associated with the misspecification. For ease of notation, hereafter we will refer to A 1j as S 1 , and we will refer to the joint set of A 2j and A 3j as S 2 .

A SUGGESTION FOR THE CHOICE OF COEFFICIENT DRIVERS
Clearly, Assumptions 1-3 are crucial for the successful implementation of the TVC approach. As noted earlier, the split of coefficient drivers stemming from these assumptions has been a problematic part of the TVC-estimation procedure. There are, however, certain requirements that can help in selecting both the sets of variables that make a good driver set and the split into two subsets inherent under Assumption 3.

Selecting the Complete Driver Set
Consider, first, the broad requirements that a complete set of drivers should fulfill; these relate to predictive power and relevance. To explain, we again present equation (2), where z 0t ≡ 1. For this set of drivers to be a good set, the drivers must explain most of the variation in γ jt . Hence, we can define an analog of the conventional R 2 for the estimated counterparts to these equations as follows: where SSε jt and SSγ jt are the sum of squared residuals and the total variation of the dependent variable, respectively. A main difficulty with (3) is that if E(ε jt ε j t ) = 0 for j = j or t = t and we use a fully efficient estimator such as GLS, then (3) will have a range from −∞ to 1 and it will be difficult to interpret it as the proportion of variation in γ jt explained by equation (2) [see Judge et al. (1985, pp. 31-32)]. To deal with this problem, let ε t = (ε 0t , ε 1t , ..., ε K−1,t ) . Then assume that, , u being a non-negative matrix. With these assumptions, the covariance matrix of ε = (ε 1 , ..., ε T ) can be represented as E(εε |z 1 , ..., z T ) = σ 2 u u . The elements of this covariance matrix are displayed in Chang et al. (1992). A goodness of fit measure for (2) that resembles Judge et al.'s (1985, p. 32) goodness measure is where (2) is written in matrix form as is a weighted mean of the dependent variable of (2). Like Judge et al.'s (1985, p. 32, (2.3.16)) newly defined R 2 , R 2 γ in (4) also lies between 0 and 1 and its value can be interpreted as the proportion of weighted variation in γ explained by regression (2). However, the methods of estimation of (4) and Judge et al.'s newly defined R 2 are not the same.
We will now show how to compute measure (4). Substitute the right-hand side of (2) for γ jt in (1). Doing this for all j gives the fixed coefficient version of (1). The unknown coefficients (π 's) and the unobserved errors (ε's) in (2) and the unknown u in (4) can be estimated for all j and t, using an iteratively rescaled generalized least squares (IRSGLS) method [see Chang et al. (1992Chang et al. ( , 2000]. Inserting these estimates into (2) gives the estimates of γ in (2) for all j. These estimates can be used in (4) to evaluate R 2 γ . Here, we require R 2 γ to be as close to 1 as possible 6 so that the drivers explain a large proportion of variation in the TVC. This result could, of course, be achieved simply by having a very large number of drivers. Therefore, we also require the drivers to be relevant in the sense that the π jd 's are significantly different from zero. Estimation of the full TVC model produces a covariance matrix for the estimated π jd , so that conventional t statistics and probability levels may be produced in the standard way.
These two conditions are closely analogous to the idea of relevance in instrumental variable estimation, where the instruments must be highly correlated with the variables being instrumented. If the R 2 γ in (4) is low, then we can infer that we have a weak set of coefficient drivers. There is, however, no requirement for the drivers to be independent of the coefficient γ jt , as there is for instruments to be independent of the error term under an IV estimation procedure. Finally, we also do not require the coefficient drivers to be independent of the errors in the model. In fact, we need to pick drivers that are highly correlated with the errors, and hence the misspecification.
Estimating the components of the coefficients of (1) is our next task.

Splitting the Driver Set
The more difficult issue is how to perform the split in the coefficient drivers into the two sets outlined under Assumption 3-that is, S 1 , which contains the variables correlated with the bias-free coefficient, and S 2 , which is the set of drivers correlated with the misspecification. The suggestion being made here is that certain drivers should be chosen to explicitly capture any nonlinearities that may exist in the compressed real-world relationship underlying (1). We will discuss later exactly how this should be done. All other driver variables would then be assumed to be associated with misspecification and should therefore be removed when obtaining the bias-free component.
The following examples should make this clear. Let us assume that the original real-world relationship underlying (1) is given by This function meets all relevant preexisting conditions, as some of its arguments are such that their values are automatically held constant when the partial derivatives of y * t with respect to the included regressors other than the preexisting conditions are taken. 7 The true functional form of model (5) is unknown and all of its variables are unknown and unobserved. We are interested in estimating where the values of all the arguments of f (x * 1t , ..., x * m t t ) other than x * jt are held constant.
To understand how the split of drivers may be accurately done, consider the following.

Example 1
If (5) is linear, then the S 1 set consists of just the constant z 0t ≡ 1, and all other drivers explain the biases that stem from missing regressors and measurement error.

Example 2
Suppose that (5) is a polynomial, such as a quadratic form. Consider, for simplicity, the case of only two explanatory variables. Then the original real-world relationship becomes 8 We are interested in treating x * 2t as an omitted regressor and also in estimating If the true functional form of equation (6) were unknown to us, we would need to rely on the TVC model Substituting in this TVC model the right-hand side of the second equality sign in the equation is the function with the correct functional form of the "sufficient set" λ * 0t of the omitted regressor x * 2t ; it is unique and is treated as the error term. The coefficient is unique, being the sum of the partial derivative α * 1t in (8), and the omitted-regressor bias α * 2t λ * 1t ; α * 1t is the bias-free component of the is the compressed realworld relationship, which treats x * 2t as an omitted regressor. The reason that we call both this compressed equation and the original equation in (7) real-world relationships is that no approximation is followed in going from the original to the compressed form. Place the measurement errors, y t = y * t +ν * 0t and x 1t = x * 1t +ν * 1t , at the appropriate places in the compressed real-world relationship. Doing so gives x 1t )x 1t , where the intercept and the coefficient of x 1t are the same as γ 0t and γ 1t , respectively, in the following. Clearly because of the treatment given to x 2 , OLS applied to the compressed equation would give biased estimates of the parameter. The TVC model to be estimated is then given by Now, if we include an explicit driver to capture the nonlinearity of (7), we will get the equations γ 0t = π 00 z 0t + π 01 z 1t + π 02 z 2t + π 03 z 3t + π 04 z 4t + π 05 z 5t + π 06 z 6t + ε 0t , (10) γ 1t = π 10 z 0t + π 11 z 1t + π 12 z 2t + π 13 z 3t + π 14 z 4t + π 15 z 5t + π 16 z 6t + ε 1t , where we need to choose z 1t , z 2t , z 3t , z 4t , z 5t , and z 6t such that they are highly correlated with α * 0t , α * 2t λ * 0t , ν * 0t , α * 1t , α * 2t λ * 1t , and (α * x 1t ), respectively, and set π 04 = π 05 = π 06 = 0 and π 11 = π 12 = π 13 = 0. Note that each of the z's in equations (10) and (11) need not be a single coefficient driver. For example, for j = 0, 1, h = 1, . . ., 6, π jh z ht can be equal to g π jhg z htg . Now the question is, how do we find such z's when the true functional form of (7) is unknown? Had we known the true functional form of (7) we would have understood that α * 1t is a linear function of x * 1t , α * 2t is a linear function of x * 2t , α * 2t λ * 1t is a nonlinear function of x * 2t and x 1t ) is a nonlinear function of all the variables: , and the measurement error in x 1t . Suppose that we could somehow guess this and based on this guess we set z 4t = x 1t . Let us substitute the right-hand sides of equations (10) and (11) for γ 0t and γ 1t in equation (9), respectively. Then y t = π 00 z 0t + π 01 z 1t + π 02 z 2t + π 03 z 3t +(π 10 z 0t + π 14 z 4t +π 15 z 5t + π 16 z 6t )x 1t In this equation, the sum π 00 z 0t +(π 10 z 0t +π 14 z 4t )x 1t gives a quadratic function in x 1t if z 4t = x 1t . This quadratic function can represent the sum of the first three terms on the right-hand side of (7) well if data on x 1t contain negligible magnitudes of measurement error. The coefficients of equation (12) can be estimated consistently by applying an IRSGLS method to (12). With the choice z 4t = x 1t , the estimates of the coefficients of (12) may give better estimates of the components of the coefficients of equation (9) than other choices. If this is true, then an estimate of α * 1t is a kernel density estimate given byπ 10 +π 14 x 1t , t = 1, . . ., T, wherê π 10 andπ 14 are the IRSGLS estimates of π 10 and π 14 , respectively. If the central tendencies of this kernel density estimates are unreasonable, then the estimates of the other terms of (11) can be added to the estimateπ 10 +π 14 z 4t (= x 1t ). The large sample properties of kernel density estimates are given in Lehmann (1999). It is a good idea to study the robustness of these kernel density estimates to changes in the set of coefficient drivers. Because we would not know whether the true model was quadratic, we could include higher-order polynomial terms and test their significance in the usual way to see how many polynomial terms would be needed. If the nonlinearity was not, in fact, a polynomial, then there are two possible courses of action.
1. We could include a number of polynomial terms and think of this as a Taylor series approximation to the true unknown form. 2. We could try a range of specific nonlinear forms, again testing one form against another.
With regard to the second option, by using this option the standard TVC model is able to nest a number of popular nonlinear models within a single framework, which also allows for measurement error and missing variables. This procedure is very different from other standard procedures. For example, a popular nonlinear model is the smooth transition autoregressive model (STAR). This allows a parameter to move smoothly between two values according to a function that responds to some threshold variable. If we were estimating a TVC such as (9) but believed the true nonlinearity followed a STAR form, then we could specify the driver equations as where G(z t , ζ, c) is the transition function-typically a logistic function for the LSTAR model, an exponential function for the ESTAR model, or a second-order logistic function, z is the transition variable, and ζ and c are parameters [Ahmad and Lo (2014)]. The model given by (14) is more general than the standard STAR model, as it includes the drivers associated with measurement error and missing variables and will hence correct for these misspecifications. Model (14) also has a stochastic error term and thus becomes a stochastic STAR model. The split into the two subsets is again obvious, as the two terms capturing the STAR effect are clearly the set that is appropriate for S 1 . Another interesting nonlinearity would be to use a set of combinations of simple nonlinear functions such as the log or exponential function. In this case the TVC model would begin to encompass a neural net and the universal approximation theorems of Hal White would suggest that with sufficient complexity the model could then approximate any unknown functional form to any degree of accuracy.
Under certain conditions, the IRSGLS estimators of the π 's in (2) are consistent and asymptotically efficient [see Swamy et al. (2010)]. 9 The distributional theory underlying this estimation technique and the method for constructing inference are given in Swamy et al. (2010). However, this software is not used widely, and so here we point out a way in which these models may be estimated using standard software such as EVIEWS. The preceding models are written in a form that exactly corresponds to the state space form. Although the usual interpretation of state space models is rather different from the TVC procedure being discussed here, the mathematics of the Kalman filter goes through exactly to yield minimum least-squares estimates [Harvey (1989)]. If the errors of the driver equations are not normal with a constant variance, then the Kalman filter will not yield maximum likelihood estimates; they may, however, be interpreted as quasi-maximum likelihood estimates [White (1980)], and they will be consistent, although not generally as efficient as the IRSGLS technique mentioned earlier. (This means that all of these models may be estimated using the Kalman filter, which can estimate all of the parameters of the model by maximizing the quasi-likelihood function. The Kalman filter and the state space form must be linear in the state variables, but they can easily handle nonlinearities in the other variables; all of the preceding variants may be estimated in standard software such as EVIEWS. An Appendix to this paper provides the EVIEWS code that was used to estimate the example following.

AN APPLICATION
In this section, we investigate the effects of rating-agency decisions on the sovereign bond spread between Greece and Germany. The underlying hypothesis is that this relationship is highly nonlinear-for example, a decline in ratings by one notch, from, say, AA to A, will have a relatively small effect on spreads, whereas a decline from, say, B-minus to CCC will have a much larger effect on spreads. Thus, as the rating goes down, the effect on spreads becomes proportionally larger. We will be undertaking quasi-maximum likelihood estimation to illustrate the application of these techniques in standard software (EVIEWS).
The intuition underlying this mechanism is as follows. Consider a world that includes two rating agencies, X and Y. In assigning ratings to a particular sovereign, both agencies have access to essentially identical information sets composed of the (present and projected) fundamentals, including spreads, competitiveness, real growth, inflation, fiscal and external positions, and, perhaps, noneconomic variables such as measures of political stability. Suppose that, based on its assessment of the information set of a particular country, rating agency X moves to downgrade the sovereign debt of the country in question. The announcement of the downgrade will very likely trigger a rise in the sovereign's interest rate. In addition, under the ECB's collateral framework, haircuts on sovereigns rise if ratings fall to a specified (triple-B) level and are ineligible as collateral below single-B minus. For these reasons, the very action by rating agency X changes the information set of rating agency Y, because that information set now includes X's downgrade, the resulting higher interest rates, and possibly higher haircuts on collateral, lower projected growth (because of the rise in interest rates), and less sustainable fiscal balances for the country in question. Consequently, rating agency Y, which may have been content with the rating it had assigned to the sovereign in question prior to X's downgrade, may move to downgrade the sovereign's rating based on the changed information set. In this way, X's original action can precipitate a downgrade by Y, triggering self-perpetuating feedback loops between ratings and spreads.
Of course, there are many other things that might affect spreads, such as debt, deficits, relative prices, and politics. Therefore, if we examined a simple relationship between spreads and ratings, omitted variables would cause bias for a standard OLS regression.
The data used are monthly and cover the period 1998m1 to 2012m6. In cases for which the original data are quarterly, the data have been interpolated to a monthly frequency; where appropriate, variables are measured relative to the corresponding variables for Germany. As mentioned, the dependent variable, sp, is the yield spread between the 10-year benchmark government bond yield of Greece and that of Germany. Our explanatory variables are measures of macroeconomic and political fundamentals, as follows: pol represents political stability. We use the IFO World Economic Survey Index of Political Stability. A rise in the index implies greater stability.
dgdp is real GDP growth. A relatively high rate of economic growth suggests that a country's existing debt burden will become easier to service over time. cnewssq is "fiscal news." To capture the news (or surprise) element that has figured strongly in Greece's experience, we use real-time fiscal data. In particular, using the European Commission Spring and Autumn forecasts, we use a series of forecast revisions. For example, the revision in the Spring 2001 forecasts is the 2001 deficit/GDP ratio in the spring compared to the forecast for 2001 made in the autumn of 2000. This procedure generates a series of revisions, which, when cumulated over time, provides a cumulative fiscal news variable. relp is relative prices. To help capture relative changes in competitiveness, we use Greece's Harmonized Index of Consumer Prices (HICP, all items index) relative to that of Germany. debtogdp is the debt-to-GDP ratio. A higher debt burden should correspond to a higher risk of default. We include the general government consolidated gross debt-to-GDP ratio (expressed as a percentage), interpolated from a quarterly to a monthly frequency. rate is the agencies' credit rating for Greek government debt. The ratings of three agencies are used: Fitch, Moody's, and Standard & Poor's. To capture the effects of ratings on spreads, we include ordinal ratings to allow for nonlinearities in the relationship between ratings and spreads. For example, the dummy variable triple-A takes a value of 1 for the period for which the rating was triple-A, and a value of zero otherwise. We date rating changes by identifying the agency that made the first move from one rating to another, on the assumption that the first mover would cause the subsequent reaction. In other words, if rating agency A downgraded Greece from Ato BBB+ in April, say, and subsequently rating agency B downgraded Greece from A to A-in June, then the second downgrade would not register. Our basic TVC model is Our coefficient driver equations take the form α 0t =π 00 +π 01 pol+π 02 dgdp+π 03 cnewssq+π 04 relp+π 05 debtogdp+ε 0t , (16) α 1t =π 10 +π 11 rate+π 12 pol+π 13 dgdp+π 14 cnewssq+π 15 relp+π 16 debtogdp+ε 1t .
(17) In this driver set, rate gives a quadratic effect to the coefficient on ratings, allowing for a strong nonlinearity. We begin by estimating this general model, to obtain the following results (where t-stats are in parentheses): (17 ) As mentioned in Section 3, selection of drivers should be made on the basis of the explanatory power of the driver equations and the significance of the individual Bias free coefficient total coefficient FIGURE 1. Time profiles of α 1t in equation (15) The coefficient of interest is α 1t , that is, the effect of ratings on the two-year spread. Effectively, we have taken the variable "rate" from the basic equation for spreads-i.e., equation (15)-and used it as the driver. Then, by substituting back into the basic equation, we obtain a quadratic effect in the basic equation. Figure 1 shows the total value of this TVC, along with the bias-free effect that is given by subtracting the error term and the effect from rate and cnewssq.
Because the scaling of the two coefficients is quite different, Figure 2 shows just the bias-free coefficient. This makes the strong nonlinearity clear as ratings start to rise (that is, deteriorate). After 2008, the effect of ratings on spreads becomes increasingly powerful. We therefore have found a very strong quadratic link between ratings and spreads. Bias free coefficient

CONCLUSIONS
This paper has proposed a new way of deriving the split between coefficient drivers when deriving the bias-free estimate of a coefficient within the TVC estimation framework. We have argued that, if a model with unique coefficients and error term is linear, then only the constant in the coefficient driver set should be retained. If a model with unique coefficients and error term is nonlinear, however, then an explicit set of drivers that capture the nonlinearity should be chosen. In the absence of any specific information about the precise form of the nonlinearity, this can best be achieved by using a set of polynomials in the explanatory variables. These drivers are then the only ones that should be retained when the bias-free component is derived. We have also argued that two conditions should be applied to the drivers, which we call predictive power and relevance; that is, the drivers should explain a large proportion of the movement in the TVCs and they should be statistically significant. We illustrated this process by estimating a nonlinear relationship between country-risk ratings and sovereign bond spreads for Greece. We showed that there is a highly nonlinear effect here. Finally, the procedure can be implemented using standard software such as EVIEWS. The code for the estimated model is provided in Appendix A.

NOTES
1. The development of a type of variable coefficient estimation is due to Swamy (1970Swamy ( , 1974. See also Tavlas (2001, 2007) and Swamy et al. (2010). In this code, SP gr is the spread between Greek and German ten-year government bond yields; Rate gr is the rating on country risk, denoted as a variable scaled from 1 to 20, where 1 is equivalent to a AAA rating; pol gr is an indicator of political unrest in Greece; cnewssq is an indicator of news about Greek GDP derived from ECB forecasts; relp gr is the relative process between Greece and Germany; and debtogdp gr is the debt-to-GDP ratio for Greece.
This model is also coded to have a first-order moving average error process on the two time-varying coefficients sv1 and sv2, which is usual in TVC models.
The c(?) parameters are the usual EVIEWS notation for parameters to be estimated by maximum likelihood.