Forecasting Time Series With Complex Seasonal Patterns Using Exponential Smoothing

An innovations state space modeling framework is introduced for forecasting complex seasonal time series such as those with multiple seasonal periods, high-frequency seasonality, non-integer seasonality, and dual-calendar effects. The new framework incorporates Box–Cox transformations, Fourier representations with time varying coefficients, and ARMA error correction. Likelihood evaluation and analytical expressions for point forecasts and interval predictions under the assumption of Gaussian errors are derived, leading to a simple, comprehensive approach to forecasting complex seasonal time series. A key feature of the framework is that it relies on a new method that greatly reduces the computational burden in the maximum likelihood estimation. The modeling framework is useful for a broad range of applications, its versatility being illustrated in three empirical studies. In addition, the proposed trigonometric formulation is presented as a means of decomposing complex seasonal time series, and it is shown that this decomposition leads to the identification and extraction of seasonal components which are otherwise not apparent in the time series plot itself.


INTRODUCTION
Many time series exhibit complex seasonal patterns. Some, most commonly weekly series, have patterns with a non-integer period. Weekly U.S. finished motor gasoline products in thousands of barrels per day, as shown in Figure 1(a), has an annual seasonal pattern with period 365.25/7 ≈ 52.179.
Other series have high-frequency multiple seasonal patterns. The number of retail banking call arrivals per 5-minute interval between 7:00 a.m. and 9:05 p.m. each weekday, as depicted in Figure 1(b), has a daily seasonal pattern with period 169 and a weekly seasonal pattern with period 169 × 5 = 845. A longer version of this series might also exhibit an annual seasonal pattern. Further examples where such multiple seasonal patterns can occur include daily hospital admissions, requests for cash at ATMs, electricity and water usage, and access to computer web sites.
Yet other series may have dual-calendar seasonal effects. Daily electricity demand in Turkey over nine years from 1 January 2000 to 31 December 2008, shown in Figure 1(c), has a weekly seasonal pattern and two annual seasonal patterns: one for the Hijri calendar with a period of 354.37; and the other for the Gregorian calendar with a period of 365.25. The Islamic Hijri calendar is based on lunar cycles and is used for religious activities and related holidays. It is approximately 11 days shorter than the Gregorian calendar. The Jewish, Hindu, and Chinese calendars create similar effects that can be observed in time series affected by cultural and social events (e.g., electricity demand, water usage, and other related consumption data), and need to be accounted for in forecasting studies (Lin and Liu 2002;Riazuddin and Khan 2005). Unlike the multiple periods Alysha M. De Livera is Research Fellow, Faculty of Science, The University of Melbourne, Victoria 3010, Australia (E-mail: alyshad@unimelb.edu.au). Rob J. Hyndman is Professor, Department of Econometrics and Business Statistics, Monash University, Victoria 3800, Australia (E-mail: rob.hyndman@ monash.edu). Ralph D. Snyder is Associate Professor, Department of Econometrics and Business Statistics, Monash University, Victoria 3800, Australia (E-mail: ralph.snyder@monash.edu). The first author acknowledges the support provided by the Commonwealth Scientific and Industrial Research Organisation, Australia. The authors thank Dr. Peter Toscas from the Commonwealth Scientific and Industrial Research Organisation, Australia, the editor, the associate editor, and two referees for comments that improved the clarity and quality of the article. seen with hourly and daily data, these dual calendar effects involve non-nested seasonal periods.
Most existing time series models are designed to accommodate simple seasonal patterns with a small integer-valued period (such as 12 for monthly data or 4 for quarterly data). Important exceptions (Harvey and Koopman 1993;Harvey, Koopman, and Riani 1997;Taylor 2003Taylor , 2010bPedregal and Young 2006;Gould et al. 2008;Taylor and Snyder 2009) handle some but not all of the above complexities. Harvey, Koopman, and Riani (1997), for example, used a trigonometric approach for single seasonal time series within a traditional multiple source of error state space framework. The single source of error approach adopted in this article is similar in some respects, but admits a larger effective parameter space with the possibility of better forecasts (see Hyndman et al. 2008, chap. 13), allows for multiple nested and non-nested seasonal patterns, and handles potential nonlinearities. The articles by Pedregal and Young (2006) and Harvey and Koopman (1993) have models for double seasonal time series, but they have not been sufficiently developed for time series with more than two seasonal patterns, and are not capable of accommodating the nonlinearity found in many time series in practice. Similarly, in modeling complex seasonality, the existing exponential smoothing models (e.g., Taylor 2003Taylor , 2010bGould et al. 2008;Taylor and Snyder 2009) suffer from various weaknesses such as overparameterization, and the inability to accommodate both non-integer period and dualcalendar effects. In contrast, we introduce a new innovations state space modeling framework based on a trigonometric formulation which is capable of tackling all of these seasonal complexities. Using the time series in Figure 1, we demonstrate the versatility of the proposed approach for forecasting and decomposition.
In Section 2.1 we review the existing seasonal innovations state space models including an examination of their weaknesses, particularly in relation to complex seasonal patterns. We then introduce in Sections 2.2 and 2.3 two generalizations designed to overcome some or all of these problems, one relying on trigonometric representations for handling complex as well as the usual single seasonal patterns in a straightforward manner with fewer parameters. Section 3 contains a new method for the calculation of maximum likelihood estimators, formulas for point and interval predictions, and the description of the model selection methodology. It will be seen that the proposed estimation procedure is sufficiently general to be applied to any innovations state space model while possessing some important advantages over an existing approach. The proposed models are then applied in Section 4 to the time series from Figure 1 where it will be seen that the trigonometric formulation leads to better forecasts and may be used for decomposition. Some conclusions are drawn in Section 5.

EXPONENTIAL SMOOTHING MODELS
FOR SEASONAL DATA

Traditional Approaches
Single seasonal exponential smoothing methods, which are among the most widely used forecasting procedures in practice (Makridakis et al. 1982;Makridakis and Hibon 2000;Snyder, Koehler, and Ord 2002), have been shown to be optimal for a class of innovations state space models (Ord, Koehler, and Snyder 1997;Hyndman et al. 2002). They are therefore best studied in terms of this framework because it then admits the possibility of likelihood calculation, the derivation of consistent prediction intervals, and model selection based on information criteria. The single source of error (innovations) state space model is an alternative to its common multiple source of error analogue (Harvey 1989) but it is simpler, more robust, and has several other advantages .
The most commonly employed seasonal models in the innovations state space framework include those underlying the well-known Holt-Winters additive and multiplicative methods. Taylor (2003) extended the linear version of the Holt-Winters method to incorporate a second seasonal component as follows: where m 1 and m 2 are the periods of the seasonal cycles and d t is a white-noise random variable representing the prediction error (or disturbance). The components t and b t represent the level and trend components of the series at time t, respectively, and s (i) t represents the ith seasonal component at time t. The coefficients α, β, γ 1 , and γ 2 are the so-called smoothing parameters, and 0 , b 0 , {s 0 } are the initial state variables (or "seeds").
The parameters and seeds must be estimated, but this can be difficult when the number of seasonal components is large. This problem is partly addressed by noting that there is a redundancy when m 2 is an integer multiple of m 1 , something that seems to have previously gone unnoticed. Consider a time series {r t } consisting of repeated sequences of the constants c 1 , . . . , c m 1 , one for each season in the smaller cycle. Then the seasonal equations can be written as When these are summed, the effect of r t disappears. This suggests that the m 1 seed seasonal effects for the smaller seasonal cycle can be set to zero without constraining the problem in any way. Alternatively, each sub-season repeats itself m 2 /m 1 times within the longer seasonal pattern. We can impose the constraint that the seed seasonal effects associated with each sub-season must sum to zero. For example, the period 10:00-11:00 a.m. repeats itself seven times in a week. We can insist that the seven seasonal effects associated with this particular hour sum to zero and that this is repeated for each of the 24 hour periods in a day. Analogues of these restrictions can be developed when there are three or more seasonal patterns. Despite this correction, a large number of initial seasonal values remain to be estimated when some of the seasonal patterns have large periods, and such a model is likely to be overparameterized. For double seasonal time series Gould et al. (2008) attempted to reduce this problem by dividing the longer seasonal length into sub-seasonal cycles that have similar patterns. However, their adaptation is relatively complex and can only be used for double seasonal patterns where one seasonal length is a multiple of the other. To avoid the potentially large optimization problem, the initial states are usually approximated with various heuristics (Taylor 2003(Taylor , 2010bGould et al. 2008), a practice that does not lead to optimized seed states. We will propose an alternative estimation method, one that relies on the principle of least squares to obtain optimized seed states-see Section 3.
A further problem is that none of the approaches based on (1) can be used to handle complex seasonal patterns such as non-integer seasonality and calendar effects, or time series with non-nested seasonal patterns. One of our proposed models will allow for all these features.
The nonlinear versions of the state space models underpinning exponential smoothing, although widely used, suffer from some important weaknesses. Akram, Hyndman, and Ord (2009) showed that most nonlinear seasonal versions can be unstable, having infinite forecast variances beyond a certain forecasting horizon. For some of the multiplicative error models which do not have this flaw, Akram, Hyndman, and Ord (2009) proved that sample paths will converge almost surely to zero even when the error distribution is non-Gaussian. Furthermore, for nonlinear models, analytical results for the prediction distributions are not available.
The models used for exponential smoothing assume that the error process {d t } is serially uncorrelated. However, this may not always be the case. In an empirical study, using the Holt-Winters method for multiplicative seasonality, Chatfield (1978) showed that the error process there is correlated and can be described by an AR(1) process. Taylor (2003), in a study of electricity demand forecasting using a double-seasonal Holt-Winters multiplicative method, found a similar problem. Others such as , Reid (1975), and Gilchrist (1976) have also mentioned this issue of correlated errors, and the possibility of improving forecast accuracy by explicitly modeling it. The source of this autocorrelation may be due to features of the series not explicitly allowed for in the specification of the states. Annual seasonal effects may impact on the call center data, for example, but the limited sample size means that it cannot be explicitly modeled.

Modified Models
We now consider various modifications of the state space models for exponential smoothing to handle a wider variety of seasonal patterns, and to also deal with the problems raised above.
To avoid the problems with nonlinear models that are noted above, we restrict attention to linear homoscedastic models but allow some types of nonlinearity using Box-Cox transformations (Box and Cox 1964). This limits our approach to only positive time series, but most series of interest in practice are positive. The notation y (ω) t is used to represent Box-Cox transformed observations with the parameter ω, where y t is the observation at time t.
We can extend model (1) to include a Box-Cox transformation, ARMA errors, and T seasonal patterns as follows: where m 1 , . . . , m T denote the seasonal periods, t is the local level in period t, b is the long-run trend, b t is the short-run trend in period t, s (i) t represents the ith seasonal component at time t, d t denotes an ARMA(p, q) process, and ε t is a Gaussian whitenoise process with zero mean and constant variance σ 2 . The smoothing parameters are given by α, β, and γ i for i = 1, . . . , T. We adopt the Gardner and McKenzie (1985) damped trend with damping parameter φ, but follow the suggestion in the article by Snyder (2006) to supplement it with a long-run trend b. This change ensures that predictions of future values of the short-run trend b t converge to the long-run trend b instead of zero. The damping factor is included in the level and measurement equations as well as the trend equation for consistency with the work of Gardner and McKenzie (1985), but identical predictions are obtained (see Snyder 2006) if it is excluded from the level and measurement equations.
The BATS model is the most obvious generalization of the traditional seasonal innovations models to allow for multiple seasonal periods. However, it cannot accommodate non-integer seasonality, and it can have a very large number of states; the initial seasonal component alone contains m T nonzero states. This becomes a huge number of values for seasonal patterns with high periods.

Trigonometric Seasonal Models
In the quest for a more flexible parsimonious approach, we introduce the following trigonometric representation of seasonal components based on Fourier series (Harvey 1989;West and Harrison 1997): where γ are smoothing parameters and λ j,t , and the stochastic growth in the level of the ith seasonal component that is needed to describe the change in the seasonal component over time by s * (i) j,t . The number of harmonics required for the ith seasonal component is denoted by k i . The approach is equivalent to index seasonal approaches when k i = m i /2 for even values of m i , and when k i = (m i − 1)/2 for odd values of m i . It is anticipated that most seasonal components will require fewer harmonics, thus reducing the number of parameters to be estimated. A deterministic representation of the seasonal components (e.g., Abraham and Box 1978) can be obtained by setting the smoothing parameters equal to zero.
A new class of innovations state space models is obtained by replacing the seasonal component s (3) by the trigonometric seasonal formulation, and the measurement equation by y (ω) This class is designated by TBATS, the initial T connoting "trigonometric." To provide more details about their structure, this identifier is supplemented with relevant arguments to give the designation A TBATS model requires the estimation of 2(k 1 + k 2 + · · · + k T ) initial seasonal values, a number which is likely to be much smaller than the number of seasonal seed parameters in a BATS models. Because it relies on trigonometric functions, it can be used to model non-integer seasonal frequencies. A TBATS model should be distinguished from two other related (Proietti 2000) multiple source of error seasonal formulations presented by Hannan, Terrel, and Tuckwell (1970) and Harvey (1989). Some of the key advantages of the TBATS modeling framework are: (i) Being an innovations state space model, it admits a larger effective parameter space with the possibility of better forecasts (Hyndman et al. 2008, chap. 13); (ii) it allows for the accommodation of nested and non-nested multiple seasonal components; (iii) it handles typical nonlinear features that are often seen in real time series; (iv) it allows for any autocorrelation in the residuals to be taken into account; and (v) it involves a much simpler, yet efficient estimation procedure (see Section 3).

Innovations State Space Formulations
The above models are special cases of the linear innovations state space model (Anderson and Moore 1979) adapted here to incorporate the Box-Cox transformation to handle nonlinearities. It then has the form where w is a row vector, g is a column vector, F is a matrix, and x t is the unobserved state vector at time t.
2.4.1 TBATS Model. The state vector for the TBATS model with a nonstationary growth term can be defined as We shall also need the matrices B = γ ϕ, C = γ θ , , respectively, for j = 1, 2, . . . , k i and i = 1, . . . , T, and where denotes the direct sum of the matrices. Let τ = 2 T i=1 k i . Then the matrices for the TBATS model can be written as These matrices apply when all of the components are present in the model. When a component is omitted, the corresponding terms in the matrices must be omitted.

BATS Model.
The state space form of the BATS model can be obtained by letting s i=1Ã i , and by replacing 2k i with m i in the matrices presented above for the TBATS models.

Reduced Forms.
It is well known that linear forecasting systems have equivalent ARIMA (Box and Jenkins 1970) reduced forms, and it has been shown that the forecasts from some exponential smoothing models are identical to the forecasts from particular ARIMA models (McKenzie 1984;Chatfield and Yar 1991). The reduced forms of BATS and TBATS models can be obtained by where L is the lag operator, and θ q (L) are polynomials of length p and q, w * = (1, φ, a), g * = (α, β, γ ) , and with the corresponding parameters defined as above. (Refer to De Livera 2010b, chap. 4 for the proofs.) For BATS models with a nonstationary growth, the reduced form is then given by (6), with For TBATS models with a nonstationary growth, the reduced form is then given by (6), with One of the benefits of using this ARIMA reduced form compared to the existing innovations state space methodology is that it allows for the derivation of exact likelihood estimates without treating initial state values as additional parameters (Gardner, Harvey, and Phillips 1980;Melard 1984;Kohn and Ansley 1985). It also allows for the derivation of forecast error variance, computation of forecast intervals, and residual analysis. However, the reduced form of the TBATS model has a relatively complex ARIMA structure which is dependent on the number of terms k i chosen for the ith seasonal component, and encompasses trigonometric coefficients that rely on the frequency of each seasonal pattern. Several other advantages of state space modeling over ARIMA modeling have been described by Durbin (2001, chap. 3, sect. 3.5).

Estimation
The typical approach with linear state space models is to estimate unknown parameters like the smoothing parameters and the damping parameter using the sum of squared errors or the Gaussian likelihood (see Hyndman et al. 2008, chap. 3). In our context it is necessary to also estimate the unknown Box-Cox transformation parameter ω, and the ARMA coefficients.
The seed states of state space models are usually treated as random vectors. Given trial values of the unknown parameters, the joint steady-state distributions of stationary states are derived, and then assigned to associated seed states. Thus, for given values of φ and σ 2 , the seed short-run growth rate would be assigned an N(0, σ 2 /(1 − φ 2 )) distribution. Most states, however, are nonstationary, and they are presumed to have Gaussian distributions with arbitrarily large variances . The Kalman filter is typically used to obtain one-step-ahead prediction errors and associated variances needed for evaluating fitting criteria for given trial values of the parameters. The Kalman filter in the work of Snyder (1985b) would be appropriate for innovations state space models in particular. However, it would need to be augmented with additional equations (De Jong 1991) to handle the nonstationary states.
A simpler alternative is available in the context of innovations state space models. By conditioning on all the seed states and treating them as unknown fixed parameters, exponential smoothing can be used instead of an augmented Kalman filter to generate the one-step-ahead prediction errors needed for likelihood evaluation. In this case both the parameters and seed states are selected to maximize the resulting conditional likelihood function. If not for the different treatment of stationary states, the exponential smoothing and augmented Kalman filter approaches yield the same conditional distribution of the final state vector and so yield identical prediction distributions of future series values .
The conditional likelihood of the observed data y = (y 1 , . . . , y n ) is derived on the assumption that ε t ∼ N(0, σ 2 ). This implies that the density of the transformed series is y (ω) t ∼ N(w x t−1 , σ 2 ) so that the density of the transformed data is where ϑ is a vector containing the Box-Cox parameter, smoothing parameters, and ARMA coefficients. Therefore, the density of the original series, using the Jacobian of the Box-Cox transformation, is On concentrating out the variance σ 2 with its maximum likelihood estimateσ we obtain the log-likelihood given by log y t . (8) Substituting (7) into (8), multiplying by −2, and omitting constant terms, we get The quest is to minimize the quantity (9) to obtain maximum likelihood estimates, but the dimension of the seed states vector x 0 makes this computationally challenging. Our approach to this problem is based on the observation that ε t is a linear function of the seed vector x 0 . Thus, we show that it is possible to concentrate the seed states out of the likelihood, and so substantially reduce the dimension of the numerical optimization problem. This concentration process is the exponential smoothing analogue of de Jong's method for augmenting Kalman filter to handle seed states with infinite variances.
The innovation ε t can be eliminated from the transition equation in (5) to give x t = Dx t−1 + gy t where D = F − gw . The equation for the state, obtained by back-solving this recurrence equation to period 0, can be used in conjunction with the measurement equation to obtain whereỹ t = y (ω) t −w x t−1 ,x t = Dx t−1 +gy t , w t = Dw t−1 ,x 0 = 0, and w 0 = w (see Snyder 1985a, for the derivation). Thus, the relationship between each error and the initial state vector x 0 is linear. It can also be seen from (10) that the seed vector x 0 corresponds to a regression coefficients vector, and so it may be estimated using conventional linear least squares methods.
Thus, the problem reduces to minimizing the following with respect to ϑ : where SSE * is the optimized value of the sum of squared errors for given parameter values.
In contrast to the existing estimation procedure (Hyndman et al. 2008, chap. 14) where a heuristic scheme is employed to find the values of the initial states, our approach concentrates out the initial state values from the likelihood function, leaving only the much smaller parameter vector for optimization, a tactic in our experience that leads to better forecasts. It may also be effective in reducing computational times instead of invoking the numerical optimizer to directly estimate the seed state vector.
We can constrain the estimation to the forecastability region (Hyndman, Akram, and Archibald 2007) so that the characteristic roots of D lie within the unit circle, a concept that is equivalent to the invertibility condition for equivalent ARIMA models. The coefficients w D j−1 g are the matrix analogues of the weights in an exponentially weighted average, and this constraint ensures that their effect is to reduce the importance placed on older data. When some roots lie on the unit circle, the discounting effect is lost (although this possibility admits some important special cases). For integer period seasonality, the seasonal values can also be constrained when optimizing, so that each seasonal component sums to zero.

Prediction
The prediction distribution in the transformed space for future period n + h, given the final state vector x n and given the parameters ϑ, σ 2 , is Gaussian. The associated random variable is designated by y where c j = w F j−1 g. The prediction distribution of y n+h|n is not normal. Point forecasts and forecast intervals, however, may be obtained using the inverse Box-Cox transformation of appropriate quantiles of the distribution of y (ω) n+h|n . The point forecast obtained this way is the median, a minimum mean absolute error predictor (Pankratz and Dudley 1987;Proietti and Riani 2009). The prediction intervals retain the required probability coverage under back-transformation because the Box-Cox transformation is monotonically increasing. To simplify matters we use the common plug-in approach to forecasting. The pertinent parameters and final state are replaced by their estimates in the above formulas. This ignores the impact of estimation error, but the latter is a second-order effect in most practical contexts.

The Use of an Information Criterion.
In this article, the AIC = L * (θ,x 0 ) + 2K is used for choosing between the models, where K is the total number of parameters in ϑ plus the number of free states in x 0 , andθ andx 0 denote the estimates of ϑ and x 0 . When one of the smoothing parameters takes the boundary value 0, the value of K is reduced by 1 as the model simplifies to a special case. For example, if β = 0, then b t = b 0 for all t. Similarly, when either φ = 1 or ω = 1, the value of K is reduced by 1 in each instance to account for the resulting simplified model (e.g., when ω = 1, the model simplifies to a linear model without a Box-Cox transformation, and when φ = 1, the model reduces to a model without a damping effect in the trend component). In the applications of our article, the Nelder-Mead algorithm (Nelder and Mead 1965), which is being successfully employed in recent R packages (e.g., forecast package for R Hyndman 2010), was used for optimization, with a tolerance level of 1e-08 for such boundary values. In an empirical study, Billah, Hyndman, and Koehler (2005) indicated that information criterion approaches, such as the AIC, provide the best basis for automated model selection, relative to other methods such as prediction-validation. Alternative information criteria such as the AICc (Burnham and Anderson 2002) may also be used.

Selecting the Number of Harmonics k j in the Trigonometric Models.
The forecasts from the TBATS model depend on the number of harmonics k i used for the seasonal component i. It is impractical to consider all possible combinations in the quest for the best combination. After much experimentation we found that the following approach leads to good models and that further improvement can rarely be achieved (see De Livera 2010b, chap. 3).
De-trend the first few seasons of the transformed data using an appropriate de-trending method. In this article, we employed the method described by Hyndman et al. (2008, chap. 2). Approximate the resulting de-trended data using the linear regres- Starting with a single harmonic, gradually add harmonics, testing the significance of each one using F-tests. Let k * i be the number of significant harmonics (with p < 0.001) for the ith seasonal component. Then fit the required model to the data with k i = k * i and compute the AIC. Considering one seasonal component at a time, repeatedly fit the model to the estimation sample, gradually increasing k i but holding all other harmonics constant for each i, until the minimum AIC is achieved. This approach, based on multiple linear regression, is preferred over letting k * i = 1 for each component, as the latter was found to be unnecessarily time-consuming.

Selecting the ARMA Orders p and q for the Models.
In selecting a model, suitable values for the ARMA orders p and q must also be found. We do this using a two-step procedure. First, a suitable model with no ARMA component is selected. Then the automatic ARIMA algorithm of Hyndman and Khandakar (2008) is applied to the residuals from this model in order to determine the appropriate orders p and q (we assume the residuals are stationary). The selected model is then fitted again but with an ARMA(p, q) error component, where the ARMA coefficients are estimated jointly with the rest of the parameters. The ARMA component is only retained if the resulting model has lower AIC than the model with no ARMA component. Our subsequent work on the proposed models on a large number of real time series (De Livera 2010a) has indicated that fitting ARMA in a two-step approach yielded the best out-of-sample predictions, compared to several alternative approaches.

APPLICATIONS OF THE PROPOSED MODELS
The results obtained from the application of BATS and TBATS to the three complex time series in Figure 1 are reported in this section.
In addition, it is shown that the TBATS models can be used as a means of decomposing complex seasonal time series into trend, seasonal, and irregular components. In decomposing time series, the trigonometric approach has several important advantages over the traditional seasonal formulation. First, seasonal components obtained from the BATS model are not normalized (cf. the seasonal model of Harrison and Stevens 1976). Although normalized components may not be necessary if one is only interested in the forecasts and the prediction intervals, when the seasonal component is to be analyzed separately or used for seasonal adjustment, normalized seasonal components are required (Archibald and Koehler 2003;). Thus, BATS models have to be modified, so that the seasonal components are normalized for each time period, before using them for time series decomposition (see De Livera 2010b, chap. 5 for a normalized version of the BATS model). In contrast, the trigonometric terms in TBATS models do not require normalization, and so are more appropriate for decomposition. Second, in estimating the seasonal components using BATS, a large number of parameters are required, which often leads to noisy seasonal components. In contrast, a smoother seasonal decomposition is expected from TBATS where the smoothness of the seasonal component is controlled by the number of harmonics used. Furthermore, a BATS model cannot be used to decompose time series with noninteger seasonality and dual-calendar effects. Using TBATS models for complex seasonal time series, the overall seasonal component can be decomposed into several individual seasonal components with different frequencies. These individual seasonal components are given by s (i) t (i = 1, . . . , T) and the trend component is obtained by t . Extracting the trend and seasonal components then leaves behind a covariance stationary irregular component, denoted by d t . In particular, this approach leads to the identification and extraction of one or more seasonal components, which may not be apparent in the time series plots themselves. forecastingprinciples.com/files/T_competition_new.pdf for details). The data are observed weekly and show a strong annual seasonal pattern. The length of seasonality of the time series is m 1 = 365.25/7 ≈ 52.179. The time series exhibits an upward additive trend and an additive seasonal pattern, that is, a pattern for which the variation does not change with the level of the time series.

Application to Weekly U.S. Gasoline Data
The series, which consists of 745 observations, was split into two segments: an estimation sample period (484 observations) and a test sample (261 observations). The estimation sample was used to obtain the maximum likelihood estimates of the initial states and the smoothing parameters, and to select the appropriate number of harmonics and ARMA orders. Following the procedure for finding the number of harmonics to start with, it was found that only one harmonic was highly significant. The model was then fitted to the whole estimation sample of 484 values by minimizing the criterion equation (11). The values of the AIC decreased until k 1 = 7, and then started to increase.
Out-of-sample performance was measured by the root mean squared error (RMSE), defined as where p = 261 is the length of the test sample, n = 484 is the length of the estimation sample, and h is the length of the forecast horizon. Further analysis showed that changing the value of k 1 from 7 generated worse out-of-sample results, indicating that the use of the AIC as the criterion for this model selection procedure is a reasonable choice. In this way, the TBATS(0.9922, 1, 0, 0, {365.25/7, 7}) model was obtained. As a second step, ARMA models were fitted to the residuals with (p, q) combinations up to p = q = 5, and it was discovered that the TBATS(0.9922, 1, 0, 1, {365.25/7, 7}) model minimizes the AIC. The BATS model was considered next with m 1 = 52, and following the above procedure, it was discovered that the BATS(0.9875, 1, 0, 1, 52) model minimized the AIC. Figure 2 shows the out-of-sample RMSEs obtained for the two models, and it can be seen that the trigonometric model performs better for all lead times.
The BATS model cannot handle the non-integer periods, and so has to be rounded off to the nearest integer. It may also be overparameterized, as 52 initial seasonal values have to be estimated. Both these problems are overcome in the trigonometric formulation.
Tables 1 and 2 show the estimated parameters obtained for the TBATS and BATS models, respectively. The estimated values of 0 for β and 1 for φ imply a purely deterministic growth rate with no damping effect. The models also imply that the irregular component of the series is correlated and can be described by an ARMA(0, 1) process, and that a strong transformation is not necessary to handle nonlinearities in the series.
The decomposition of the Gasoline time series, obtained from the fitted TBATS model, is shown in Figure 3. The vertical bars at the right side of each plot represent equal heights plotted on different scales, thus providing a comparison of the size of each component. The trigonometric formulation in TBATS allows for the removal of more randomness from the seasonal component without destroying the influential bumps.

Application to Call Center Data
The call center series in Figure 1(a) consists of 10,140 observations, that is, 12 weeks of data starting from 3 March 2003 (Weinberg, Brown, and Stroud 2007). It contains a daily seasonal pattern with period 169 and a weekly seasonal pattern with period 169 * 5 = 845. The fitting sample consists of 7605 observations (9 weeks). As the trend appears to be close to zero, the growth rate b t was omitted from the models.
The post-sample forecasting accuracies of the selected BATS and TBATS models are compared in Figure 4. Again TBATS, which requires fewer parameters to be estimated, is more accurate than BATS.
The estimated parameters for the TBATS model shown in Table 1 imply that no Box-Cox transformation is necessary for this time series, and that the weekly seasonal component is more variable than the daily seasonal component. The irregular component is modeled by an ARMA(3, 1) process.
The decomposition obtained from TBATS, as shown in Figure 5, clearly exhibits strong daily and weekly seasonal components. The weekly seasonal pattern evolves considerably over time but the daily seasonal pattern is relatively stable. As is seen from the time series plot itself, the trend component is very small in magnitude compared to the seasonal components.

Application to the Turkey Electricity Demand Data
The Turkey electricity demand series shown in Figure 1(c) has a number of important features that should be reflected in the model structure. Three seasonal components with frequencies m 1 = 7, m 2 = 354.37, and m 3 = 365.25 exist in the series. The sharp drops seen in the seasonal component with period 354.37 are due to the Seker (also known as Eid ul-Fitr) and Kurban (also known as Eid al-Adha) religious holidays, which follow the Hijri calendar, while those seen in the seasonal component with frequency 365.25 are due to national holidays which follow the Gregorian calendar. Table 3 gives the dates of the holidays from the Hijri and Gregorian calendars. Seker is a three-day festival when sweets are eaten to celebrate the end of the fast of Ramadan. Kurban is a four-day festival when sacrificial sheep are slaughtered and their meat distributed to the poor. In addition, there are national holidays which follow the Gregorian calendar as shown in the table.
In this study, the series, which covers a period of 9 years, is split into two parts: a fitting sample of n = 2191 observations (6 years) and a post-sample period of p = 1096 observations (3 years). The model selection procedure was followed to give the TBATS(0.1393, 1, 3, 2, {7, 3}, {354.37, 23}, {365.25, 3}) and BATS(0.0013,1,0,0,7,354,365) models. Figure 6 shows that a better post-sample forecasting performance is again obtained from the TBATS model. The poor performance of the BATS model may be explained by its inability to capture the dual seasonal calendar effects and the large num-     ber of values that are required to be estimated. The estimated zero values for the smoothing parameters shown in Table 2 for the BATS solution suggest stable seasonal components. The Hijri seasonal component based on the TBATS solution displays a similar level of stability. However, moderate change is implied by the TBATS model in the weekly and Gregorian seasonal components. Both models required strong Box-Cox transformations in order to handle the obvious nonlinearity in the time series plot.
The decomposition of the series obtained by using the chosen TBATS model is shown in Figure 7. The first panel shows the transformed observations and the second shows the trend component. The third panel shows the weekly seasonal component with period 7, and the fifth and the sixth panels show the seasonal component based on the Hijri calendar with period 354.37 and the seasonal component based on the Gregorian calendar with period 365.25, respectively. The seasonal compo-nents shown in the fifth and sixth panels may initially appear to be mirror images. However, their combined effect, shown in the fourth panel, indicates that this is not the case. Interpreting their combined effect as the annual seasonal component would be misleading as there is no unique annual calendar in this situation: both constituent calendars are of different lengths.
The rather wiggly components of the decomposition are probably due to the use of a large number of harmonics in each seasonal component. This is necessary to capture the sharp drops seen in the time series plot. If we were to augment the stochastic seasonal component by deterministic holiday effects (given in Table 3) represented by dummy variables, the number of harmonics required might be reduced. Using a trend component, a seasonal component, and holiday dummy variables, regression was performed on the transformed y (ω)  ture the multiple seasonality with k 1 = 3 and k 2 = k 3 = 1. The estimated holiday effect was then removed from the series and the remainder was decomposed using TBATS to achieve the result shown in Figure 8, which provides a much smoother sea-sonal decomposition. Again, the sum of the Hijri seasonal component and the Gregorian seasonal component shown in the fourth panel illustrates that the Hijri and Gregorian seasonal components do not offset each other.  This analysis demonstrates the capability of our trigonometric decomposition in extracting those seasonal components which are otherwise not apparent in graphical displays.
In forecasting complex seasonal time series with such deterministic effects, both BATS and TBATS models may be extended to accommodate regressor variables, allowing additional information to be included in the models (see De Livera 2010a, chap. 7 for a detailed discussion of the BATS and TBATS models with regressor variables).

CONCLUDING REMARKS
A new state space modeling framework, based on the innovations approach, was developed for forecasting time series with complex seasonal patterns. The new approaches offer alternatives to traditional counterparts, providing several advantages and additional options. A key feature of the proposed trigonometric framework is its ability to model both linear and nonlinear time series with single seasonality, multiple seasonality, high period seasonality, non-integer seasonality, and dualcalendar effects. We are not aware of another modeling procedure that is able to forecast and decompose all these complex seasonal time series features within a single framework.
In addition, the framework consists of a new estimation procedure which is sufficiently general to be applied to any innovations state space model. By relying on maximum likelihood estimation, it avoids the ad hoc startup choices with unknown statistical properties commonly used with exponential smoothing. By incorporating the least squares criterion, it streamlines the process of obtaining the maximum likelihood estimates.
The applications of the proposed modeling framework to three complex seasonal time series demonstrated that the trigonometric models led to a better out-of-sample performance with substantially fewer values to be estimated than traditional seasonal exponential smoothing approaches (see Table 4). The trigonometric approach was also illustrated as a means of decomposing complex seasonal time series. The ability to handle such complex seasonality is a key advantage of the trigonometric approach over most traditional decomposition methods (e.g., Cleveland et al. 1990;Findley et al. 1998;Gómez and Maravall 1998) which are mainly designed to handle monthly or quarterly data.
A further advantage of the proposed framework is its adaptability. It can be altered to encompass various deterministic effects that are often seen in real-life time series. For instance, the moving holidays such as Easter and irregular holidays can be handled by incorporating dummy variables in the models (see De Livera 2010a, chap. 7), and the varying length of months can be managed by adjusting the data for trading days before modeling (see Makridakis, Wheelwright, and Hyndman 1998). In handling moving holiday effects further forecasting performance may be obtained by considering their impact on neighboring days. For a detailed discussion on moving holiday effects refer to the work of Findley and Soukup (2000). The framework can also be adapted to handle data with zero and negative values. The use of a Box-Cox transformation limits our approach to positive time series, as is often encountered in complex seasonal time series. However, the inverse hyperbolic sine transformation (Johnson 1949) can be used in its place should the need arise.
The derivation of likelihood estimates of the proposed approach relies on the assumption of a Gaussian distribution for the errors, something that is often a reasonable approximation when the level of the process is sufficiently far from the origin ). In cases where such an assumption may conflict with the underlying structure of the data generating process, our approach can be readily adapted to non-Gaussian situations. Being based on exponential smoothing where the conditioning process ensures that successive state vectors become fixed quantities, any suitable distribution can substitute for the role of the Gaussian distribution. Thus, if the innovations have a t-distribution, the prediction error form of the likelihood can be formed directly from the product of t-distributions. The analytical form of successive prediction distributions is unknown, but they can be simulated from successive t-distributions using means obtained from the application of the equations of the innovations state space model. This can be contrasted with a Kalman filter approach, which must usually be adapted in the presence of non-Gaussian distributions, to a form which necessitates the use of computationally intensive simulation methods.
The proposed frameworks can also be extended to exploit the potential inter-series dependencies of a set of related time series, providing an alternative to the existing vector exponential smoothing framework (de Silva, Hyndman, and Snyder 2007), but with several advantages (see De Livera 2010a, chap. 7).
Our experiences suggest that the use of our proposed estimation procedure for complex seasonal time series requires relatively less computational time than some of the heuristic estimation procedures employed in recent empirical studies (e.g., Taylor 2010a). Further, it should be noted that when the estimation scheme is strictly stable, in large samples, the successive powers of the matrix D in Equation (10) converge to the null matrix. Thus, in handling complex seasonal time series, the sparse structure of this matrix can be utilized for further gain in computational efficiency (see Koenker and Ng 2003).
The R code for the methods implemented in this article will be available in the forecast package for R (Hyndman 2010). [Received December 2009. Revised June 2011