Bayesian modeling and forecasting of 24-hour high-frequency volatility: A case study of the financial crisis

This paper estimates models of high frequency index futures returns using `around the clock' 5-minute returns that incorporate the following key features: multiple persistent stochastic volatility factors, jumps in prices and volatilities, seasonal components capturing time of the day patterns, correlations between return and volatility shocks, and announcement effects. We develop an integrated MCMC approach to estimate interday and intraday parameters and states using high-frequency data without resorting to various aggregation measures like realized volatility. We provide a case study using financial crisis data from 2007 to 2009, and use particle filters to construct likelihood functions for model comparison and out-of-sample forecasting from 2009 to 2012. We show that our approach improves realized volatility forecasts by up to 50% over existing benchmarks.


INTRODUCTION
Financial crises provide a rich information source to learn about asset price dynamics and models used to capture these dynamics. For example, the 1987 Crash and 1998 LTCM hedge fund crisis highlighted the importance of stochastic volatility (SV) and jumps, in both prices and volatility, for understanding index returns (see, e.g., Bates 2000;Duffie, Pan, and Singleton 2000;Eraker, Johannes, and Polson 2003;Todorov 2011). The recent crisis provides similar opportunities largely due to two unique features. First, unlike the 1987 or 1998 crises, which were short-lived, the recent crisis began in mid 2007 and lasted well into 2009, with aftershocks into the European debt crisis and Flash-Crash in 2010. Second, structural changes in the mid 2000s led to continuous around-the-clock markets, as markets migrated from traditional floor execution during "regular" market hours to fully electronic 24-hr trading. For the first time, there is "around-the-clock" high-frequency data in a long-lasting crisis.
This article uses newly available data to study a range of important questions. What sort of models and factors are required to accurately model 24-hr high-frequency crisis returns? Do these specifications generate dynamics similar to extant ones? How useful are these models for practical applications like return distribution and volatility forecasting or trading? Answers to these questions are important for academics, policy makers, market participants, and risk managers who need to understand the structure of financial market volatility and to quantify the likelihood of potential future market movements. In particular, nearly every practical finance application-including optimal investments and trading, op-tions/derivatives pricing, market making and market microstructure, and risk management-requires volatility forecasts.
Our case study focuses on the S&P 500 index, arguably the world's most important asset market, using S&P 500 index futures, which trade 24 hours a day from Sunday evening to Friday night. We focus on in-sample model fitting, which allows us to learn about the underlying structure of returns, and fully out-ofsample prediction, which is important for applications. We use parametric models estimated from intraday returns, something rarely attempted due to data complexities and computational burdens. Figure 1 plots intraday and interday volatility of 5-min S&P 500 futures returns from March 2007 to March 2012. Intraday volatility has complicated, periodic patterns driven by the global migration of trading and macroeconomic announcements (see, e.g., Bollerslev 1997, 1998). Interday volatility is persistent, stochastic, and mean-reverting. Models capturing these components require multiple volatility factors, complicated shocks, and many parameters, which, in conjunction with huge volumes of high-frequency data, make parametric estimation difficult.
Due to these complexities, most researchers use nonparametric "realized volatility" (RV) methods to avoid directly modeling intraday returns by aggregating intraday data into a daily RV measure (see Barndorff-Nielsen and Shephard 2007;Andersen and Benzoni 2009, for reviews). One weakness is its nonparametric nature: RV approaches generally do not specify a full model of returns, which limits practical usefulness as there is no return distribution, just volatility estimates. Despite this weakness, RV methods are extremely useful and are a popular volatility forecasting approach.
Methodologically, we build new models with the flexibility to fit the complexities of 24-hr intraday data during the financial crisis. We develop novel MCMC algorithms to fit models in-sample and use particle filters to compute predictive distributions and volatility forecasts for out-of-sample validation. Although SV models are commonly implemented with MCMC, we know of no applications using realistic SV models and intraday data for out-of-sample validation.
We find strong in-and out-of-sample evidence for multiscale volatility with distinct "fast" and "slow" factors. The slow factor's half-life is about 25 days, similar to extant estimates from daily data. The fast factor, however, operates intradaily, with a half-life of an hour, capturing the "digestion" time of highfrequency news or liquidity events. Our models offer a significant improvement over traditional GARCH models estimated on intraday data. We find strong evidence for jumps (in prices and volatility) or fat-tails generated by t-distributed return shocks. Price jumps are rather small in comparison to estimates from earlier periods or option prices which identify jumps as large negative "crashes." This could be unique to the recent crisis or something more fundamental uncovered from newly available high-frequency data. A striking and important features of our analysis is a strong and uniform ranking of models both in-and out-of-sample based on predictive likelihoods.
The ultimate test of a model is usefulness, and we consider three practical applications: volatility forecasting, tail risk management, and a trading application. We compare our models' performance to popular GARCH and RV benchmarks. In forecasting volatility, our SV models generate significantly lower forecasting errors than all competitors at all horizons. The absolute performance is striking as we generate fully out-ofsample volatility forecasts with R 2 's in excess of 70%. Our SV models perform relatively and absolutely well in a quantitative risk management application-evaluating the accuracy of value-at-risk (VaR) forecasts, essentially tail forecasting-and a simple volatility trading application. Overall, we find strong evidence for the usefulness of our models and approach in all cases.

Data
This article studies S&P 500 index futures. Two contract variants exist: the traditional "full-size" contract ($250 per index point) and the "E-mini" contract ($50 per point). E-minis trade electronically on the Chicago Mercantile Exchange's (CME's) Globex platform and initially complemented the full-sized contract, which traded in a traditional "open outcry" pit. E-mini trading volumes increased steadily before expanding rapidly in 2007 (see CME Group, Labuszewski et al. 2010) with the advent of algorithmic high-frequency trading and increased global influences. S&P 500 futures are one of the most liquid contracts in the world, limiting any microstructure effects (see, e.g., Corsi et al. 2008).
We analyze 5-min data from March 11, 2007, through March 9, 2012, consisting of 352,887 5-min observations over 1293 days. The price data is for quarterly contracts, which are converted to a "continuous contract" by rolling contracts two weeks before expiration. The first two years are used for parameter estimation and the remaining for forecasting.

Stochastic Volatility Models
We model 5-min logarithmic price returns, y t , which evolve via where P t is the futures price, μ is the mean return, v t is diffusive or nonjump volatility, J t is a jump indicator with . At this level, the model resembles common jump-diffusion specifications.
There is strong evidence for stochastic volatility and jumps in S&P 500 index prices from daily data (e.g., Eraker, Johannes, and Polson 2003), index option prices (Bakshi, Cao, and Chen 1997;Bates 2000;Duffie, Pan, and Singleton 2000), and intraday data (Andersen and Shephard (2009) provide a review). Estimates from options or daily returns identify large jumps or "crashes." Studies using recent high-frequency data tend to find smaller jumps, though these studies typical ignore overnight periods.
We model total volatility via a multiplicative specification: where X t,1 and X t,2 are SV processes, and S t /A t are seasonal/announcement components. σ is interpreted as the modal volatility (i.e., v t when X t,1 = X t,2 = S t = A t = 1). The log of total diffusive variance is linear: where μ h = log(σ 2 ), x t,i = log(X 2 t,i ), s t = log(S 2 t ), and a t = log(A 2 t ). Volatility evolves stochastically via are the jumps in logvolatility. Notice the volatility jump times are coincident with those in returns. ρ = corr(ε t , η t,2 ) captures diffusive "leverage" effects via correlated shocks to returns and fast volatility. We assume a multiscale volatility specification, 0 < φ 2 < φ 1 < 1, with X t,1 and X t,2 the "slow" and "fast" volatility factors, respectively. Both factors are affected by intraday shocks, relaxing a common assumption that stochastic volatility is constant intraday (see, e.g., Bollerslev 1997, 1998). Bedendo and Hodges (2004) suggested a specification with similar factors.
The seasonal and periodic components driving the deterministic volatility patterns are modeled using cubic smoothing splines, similar to the approach in Weinberg, Brown, and Stroud (2007). The seasonal model captures the smooth Ushaped patterns during major market trading hours, as well as discontinuous changes in the intraday volatility at major market opening and closing times. Formally, the seasonal component is s t = 288 k=1 H tk β k , where H tk is an indicator (H tk = 1 if time t corresponds to period k and zero otherwise), and β k denotes the seasonal effect at period k. Following Wahba (1978) and Kohn and Ansley (1987), we express the seasonal components in state-space form with a cubic smoothing spline prior on the coefficients, β k . Appendix C provides the details.
Announcements are modeled in a similar manner. We assume that each announcement has a short-term impact, increasing market volatility for K = 5 periods after the news event, that is, markets digest the news in 25 min. Formally, α ik denotes the announcement effect for news type i, k periods after the news release (i = 1, . . . , n; k = 1, . . . , 5). The total announcement component is a t = n i=1 5 k=1 I itk α ik , where I itk is an indicator for news type i (I itk = 1 if a news release occurred at period t − k and 0 otherwise). We consider n = 14 announcement types listed in Table 9 of the Appendix. Sunday open is treated as an announcement. As in the case of seasonals, we use cubic smoothing splines described in Appendix C to smooth the coefficients α i = (α i1 , . . . , α i5 ) for each news type i.
Our model applies to all 5-min intraday returns, not just to returns from "traditional" trading hours from 9:30 to 16:00. Existing papers often either ignore or simplistically correct for overnight returns. For example, Engle and Sokalska (2012), following "common practice," delete overnight returns due either to a lack of overnight data (for individual stocks) or difficulties in modeling overnight returns, which requires both periodic and announcement components. Ignoring overnight returns is problematic for 24-hr, global markets and crisis periods. For example, on October 24, 2008, S&P 500 futures fell over 6% overnight, and deleting this period would remove important information. These specifications also often ignore announcement effects.

Estimation Approach
We take a Bayesian approach and use MCMC to simulate from the posterior distribution, . . , z T ), β = (β 1 , . . . , β 288 ) , α = (α 1 , . . . , α n ) , θ are parameters and y T = (y 1 , . . . , y T ) are returns. Appendices A and D detail the priors and algorithm, respectively. We use standard conjugate priors where possible and in all cases proper, though not strongly informative, priors. Efficiently programmed in C, the MCMC algorithm makes 12,500 draws in 12-25 min using a 2.8 GHz Xeon processor for each year of 5-min returns (around 70,500 observations). Computing time is approximately linear for the sample sizes considered.
Our algorithm is highly tuned using representation and sampling "tricks." We express the model as a linear, but non-Gaussian system and use the Carter and Kohn (1994) and Frühwirth-Schnatter (1994) forward-filtering, backward sampling algorithm for block updating, an approach first used for SV models in Kim, Shephard, and Chib (1998). When possible, parameters and states are drawn together. Following Ansley and Kohn (1987), we express the splines as a state-space model and update in blocks. Building on the methodology of Johannes, Polson, and Stroud (2009), we use auxiliary particle filters (Pitt and Shephard 1999) to approximately sample from p z t |y t , θ , where θ is the posterior mean. Appendix E provides details.
It is useful to contrast our intraday parametric estimation approach to Bollerslev (1997, 1998), the main competitor. They model 5-min exchange rates via long-memory GARCH models with seasonal effects (see also Martens, Chang, and Taylor 2002) and use an iterative two-step procedure to first estimate daily volatility, assumed constant intraday, and then estimate the seasonal component. Engle and Sokalska (2012) estimated GARCH models on intraday returns for 2500 individual stocks with a seasonal component using third-party interday volatility estimates. By contrast, our MCMC approach simultaneously estimates all parameters and states, avoiding the need for potentially inefficient two-stage estimators and restrictive assumptions like normally distributed shocks and the absence of jumps.
Another approach aggregates intraday returns into daily RV statistics, which are used to estimate models at a daily frequency (see, e.g., Barndorff-Nielsen and Shephard 2002;Todorov 2011). We estimate the models directly on 5-min returns, without aggregation into RV, which allows us to identify intraday components and forecast at high frequencies. Hansen, Huang, and Shek (2012) introduced a hybrid model, called realized GARCH (RealGARCH), combining the tractability of daily GARCH models with the information in realized volatility. We implement these promising models and compare their performance to our SV models.
Appendix D provides algorithm details, with diagnostics in the web Appendix. The MCMC algorithms mix quite well given the large number of unknown states and parameters, although models with jumps in volatility mix more slowly than those with only diffusive volatility, and volatility of volatility parameters mix relatively slowly. Parameters deep in the state space (e.g., volatility of volatility) tend to traverse the state space more slowly, consistent with multiple layers of smoothing (see, e.g., Kim, Shephard, and Chib 1998). This does not mean that these parameters are not accurately estimated, as simulation evidence does indicate they can be accurately estimated. The only model with any substantive concern is the SVCJ 2 model, and we thin the samples to alleviate any concerns. We have also considered significantly less informative priors and the results do not substantially change.

Decompositions and Diagnostics
To decompose variance and to quantify relative importance, we compute the posterior mean for the total log variance and for each variance component at each time period, for example, x t,1 = E x t,1 |y T , run univariate regressions of the form h t = α + βx t,1 + ε t , and report R 2 's for each component. We report decompositions in both log-variance and in volatility units.
To quantify model fit, we would ideally use the Bayes' factor, B t i,j = P M i |y t /P M j |y t , where {M i } M i=1 indicate models, P M i |y t ∝ p y t |M i P (M i ), and p y t |M i is the marginal likelihood. Bayes' factors are often called an "automated Occam's razor," as they penalize loosely parameterized models (Smith and Spiegelhalter 1980). Computing marginal likelihoods requires sequential parameter estimation, which is computationally prohibitive, so we alternatively report log-likelihoods and the Bayesian information criterion (BIC) statistic, which approximates the Bayes' factor. The is the conditional likelihood, and p(z t+1 |θ (i) , y t , M i ) is the state predictive distribution. Given approximate samples from p(z t |y t , θ (i) , M i ), it is easy to approximately sample from the predictive distributions and likelihoods. All distributions can be computed at 5-min and lower frequencies, such as hourly or daily, via simulation.  (Kass and Raftery 1995). BIC asymptotically (in T) approximates the posterior probability of a given model. The dimensionality or degrees of freedom are not preset for the splines, but are determined by the degree of fitted smoothness. We compute the degrees of freedom using the state-space approach of Ansley and Kohn (1987), evaluating the degrees of freedom at each iteration of the MCMC algorithm and using the posterior mean for model comparisons. Given our sample sizes, this approximation should perform well.
For comparisons, we also estimated benchmark GARCH models including a GARCH(1,1) model (Bollerslev 1986), and two models that incorporate asymmetry: the GJR-GARCH model of Glosten, Jagannathan, and Runkle (1993), and the EGARCH model of Nelson (1991), each with both normal and t-errors fit as in Andersen and Bollerslev (1997). Appendix G provides details. These intraday GARCH models are common benchmarks, but do not incorporate announcements, in part because incorporating announcements would require a three-step estimation procedure. Prior research (Martens, Chang, and Taylor 2002;or Martens, van Dijk, and de Pooter 2009) finds that announcement effects have a very small impact in GARCH models, and all of our conclusions carry through if we remove announcements from the SV models. Table 1 describes the models considered. We estimated single-factor models, but do not report estimates as the twofactor models always performed better in-and out-of-sample. Table 2 reports in-sample fit statistics including the degrees of freedom, log-likelihoods, and BIC statistics. To ease comparisons, Table 2 reports Bayes' factors based on the difference of BIC statistics relative to the SV 1 model,

In-Sample Model Fits
Better fitting models have higher likelihoods and lower BIC statistics, quantifying the improvement over a single-factor SV model. Degrees of freedom range from 253 to 284. This consists of "static" parameters d * (from 4 to 12) and the spline "parameters," d s and d a , which are less than the number of knot points (279 and 70, respectively) and determined by the spline's smoothness. More complicated models sometimes have fewer degrees of freedom than their simpler counterparts, even though they have more static parameters. The multiscale, two-factor SV models provide the best in-sample fits and, in all cases, the BIC and log-likelihood statistics provide the same conclusion, which is not surprising given the large samples. The best performing models, the SVt 2 and SVCJ 2 models, have leverage effects and allow for outliers, via either jumps or t-distributed shocks, which are needed to fit the fat-tails of intraday returns.
The multiscale SV models provide significant improvements in fit compared to the GARCH models. In fact, the Bayes' factors indicate that a simple 1-factor SV model actually outperforms all of the GARCH models, strong evidence supporting SV. This indicates there is something fundamental about the random nature of volatility in the SV model-the extra shock in the volatility evolution-that improves the fit, which can be compared to the GARCH models in which the shocks to volatility are completely driven by return shocks.
We cannot compare likelihood-based fits to RV-based models which typically do not specify an intraday return distribution. We also fit variants omitting seasonalities and/or announcements, which are not reported to save space. Both components are significant, though the announcement components are less important given the relatively small number of announcements per week. West (1986) suggested monitoring model fits sequentially through time to provide an assessment of model failure, either abruptly or slowly over time. Figure 2(a) reports in-sample sequential Bayes' factors for each model relative to the SV 1 model, BIC T (M i ) − BIC T M SV 1 . Note the gradual outperformance generated by the SVCJ 2 and SVt 2 models, indicating general fit improvement and not one generated by a very small number of observations. The relative ranking of the SV models is identical out-of-sample, confirming the in-sample results.
There is one noticeable spike on October 24, 2008, in the log Bayes' factors. This was caused by a circuit breaker locking S&P 500 futures limit down from 4:55 am to 9:30 am, which generated a number of zero returns. Exchange rules mandate   that S&P futures cannot fall by more than 60 points overnight and trading can occur at prices above, but not below, this level until 9:30. Models with fast-moving volatility were able to reduce their predictive volatility quickly, thus the relatively good fit during this event. A more complete specification would incorporate a mechanism for limit down markets. Table 3 summarizes the posteriors and reports inefficiency factors and acceptance probabilities (for the slowest mixing component, σ 1 ) for the multiscale models. There are a number of interesting results. The SV factors correspond to a slow-moving interday factor and rapidly moving intraday factor. Estimates of φ 1 in the best fit models are 0.9999, corresponding to a daily AR(1) coefficient of 0.9725 and a half-life (log 0.5/ log φ 1 ) of almost 25 days. This is consistent with studies using daily data and time-aggregation, that is, that the data provide similar inference whether sampled at intraday or daily frequencies. x t,2 operates at high frequencies with a 5-min AR(1) coefficient φ 2 of 0.926 to 0.958, generating a half-life of around an hour, and high volatility (σ 2 σ 1 ). Intuitively, there is strong evidence for rapidly dissipating high-frequency shocks to volatility. All two-factor models support an extreme form of multiscale SV that would be difficult to detect using daily data.

Parameter Estimates, Variance Decompositions and Sample Paths
Decompositions in Table 4 show the interday factor explains a majority of total variance, thus the slow-moving factor is relatively more important than the fast-moving factor. The second factor explains about 7%-10% of the total variance. Table 3 reports each volatility factor's unconditional variance, defined as τ 2 i . τ 1 is more than twice as large as τ 2 , driven by the near unit root behavior of x t,1 and despite x t,1 's low conditional volatility.
The second volatility factor plays a crucial role as it relieves a tension present in one-factor models. The SV factor in one-factor models tries to fit both low and high-frequency movements, ending up somewhere in between and fitting both poorly. For example, in the SVt 1 model, estimates of φ 1 are roughly 0.997, corresponding to a daily AR(1) coefficient of 0.4325 and a halflife of about 0.80 days, which is much slower than the fast factor and much faster than the slow factor in two-factor models. The two-factor specifications provide flexibility allowing the factors to fit higher and lower frequency volatility fluctuations.
Estimates of ν are about 20, consistent with mild nonnormality and previous daily estimates (e.g., Chib, Nardari, and Shephard 2002;Jacquier, Polson, and Rossi 2004). Though modest, ν implies vastly higher probabilities of large shocks, some of which will occur in our massive sample. Estimates of ρ are modest and around −0.10. Identifying this parameter using RV is difficult due to various biases (see, e.g., Aït-Sahalia, Fan, and Li 2013). Time-variation in the variance components accounts for most of the nonnormality in models without jumps. Mean jump sizes, μ y , are close to zero in the SVCJ 2 specification, and arrivals are frequent with κ = 0.004 corresponding to at a rate of 1.17 per day. Return jumps are relatively large as σ y is much larger than the modal (nonjump) volatility, for example, σ y = 0.202 versus σ = 0.059 in the SVCJ 2 model. Volatility jumps are quite large, with μ v = 0.816 implying that, on average jumps increase volatility by exp(μ v /2) = 1.5, or about 50%.
Our jump estimates are "big," as price jump volatility is about 4-8 times unconditional 5-min return volatilities. However, the sizes are relatively small when compared to estimates from older daily price data or option prices, which find rare jumps that are large and negative. Although our sample contains some of the largest index moves ever observed in the U.S. history, these were not large discontinuous moves, but rather a large number of modest moves in the same direction. Thus, high-frequency data in the most recent crisis provide a different view of jumps. Figure 3 summarizes the posterior distribution of S t . S t = 1 corresponds to average 5-min volatility, so S t = 0.5 would imply that volatility is roughly half average volatility. S t spikes to . Results are shown on the standard deviation scale, S = exp(β/2). For example, a value of S = 2 means that volatility is twice its baseline level. more than 2.5 at the open and close of U.S. trading, and there is a clear "U" shaped pattern during U.S. trading hours. S t fluctuates by a factor of more than 5, highlighting the importance of predictable intraday volatility. Figure 4 summarizes the most important announcements for the SVCJ 2 model (the other models are similar). Volatility after Payrolls increases by six times, with the GDP, CPI, and FOMC announcements the next most important, with volatility increases of three to four times. The rate of decrease for the FOMC announcements are slower than for Payrolls, consistent with a greater digestion time.
To understand interday volatility, Figure 5 plots daily returns, daily RV, and the slow volatility σ X t,1 . Volatility spiked first

Announcement Effects
Periods after Announcement (k)  To understand higher-frequency movements, Figure 6 plots the smoothed state variables during the week of September 14, 2008, for the SVCJ 2 model, when the following happened: on September 14, Lehman Brothers filed for bankruptcy; on September 15, a large money market fund "broke the buck"; on September 16, AIG was bailed out, there was an FOMC meeting, and Bank of America announced their purchase of Merrill Lynch; and on September 18, the SEC banned shortselling of financial stocks. The Sunday night overnight return was −2.75%, as markets digested the Lehman news. The model captures this move via a jump and elevated intra-day and interday volatility-interday volatility was more than twice its long-run average. On September 16, an FOMC announcement generated huge volatility with three 5-min returns greater than 1%. Despite the elevated announcement volatility, the model still needed a large jump in volatility. After the close of normal trading, there were additional volatility jumps corresponding to the Merrill Lynch merger. The large moves on September 18 were associated with rumors and the subsequent announcement of the short-selling ban on financial stocks, drove futures roughly 100 points higher overnight.
These results show the key role played by jumps in volatility and the fast volatility factor, capturing the impact of unexpected news arrivals by temporarily increasing volatility. In the SVt 2 model, large outlier shocks generated by the t-distributed errors play a similar role in explaining these large moves. Diffusive volatility is not able to increase rapidly enough to capture extremely large movements. Date Figure 6. Prices, returns, smoothed volatility components (total volatility, slow volatility, fast volatility, volatility jumps, seasonal, and announcement components) and absolute value of the residuals during the week of September 14-19, 2008, for the SVCJ 2 model. Each panel contains posterior means, and the bands represent 95% posterior intervals. The second panel from the bottom summarizes the seasonal fits on the left-hand axis and announcements on the right.

OUT-OF-SAMPLE RESULTS AND APPLICATIONS
Although in-sample fits are important, the ultimate test is predictive and practical: how well does the model fit future data and can the model be used for practical applications? In terms of overall predictive ability, Figure 2(b) reports out-of-sample likelihood ratios relative to the SV 1 model, which are based on the entire predictive distribution and provide an overall measure of model fit. The ranking is nearly identical to the in-sample results, and the GARCH models perform very poorly out-ofsample in fitting the entire return distribution. This is strong confirmation of model performance. In terms of applications, we consider three (volatility forecasting, quantitative risk management, and a simple volatility trading example) that are described below.

Volatility Forecasts
Volatility forecasting is required for nearly every financial application, as mentioned earlier, and is the gold-standard for evaluating estimators and models when using intraday data (see Andersen and Benzoni 2009). We compare volatility forecasts from our SV models to a range of GARCH and nonparametric RV-based estimators. We estimate parameters as of March 2009 and forecast volatility from March 2009 to March 2012, a challenging period for three reasons: the in-sample period is shorter than the out-of-sample period; the out-of-sample period had lower volatility; and we do not update parameters estimates.
We compute model-based estimates, RV 2 s,τ , of realized variance, RV 2 s,τ = τ t=1 y 2 s+t , at hourly (τ = 12) and daily (τ = 279) horizons. The 5-min forecasts are similar to the hourly ones and are not reported. Table 5 reports forecast bias, mean-absolute forecasting errors (MAE), and forecasting regression R 2 's from Mincer-Zarnowitz regressions, The SV models outperform all competitors. Compared to intraday GARCH, the SV models provide a lower bias, lower  MAE, and higher R 2 's. The SV models generate daily R 2 's of 73%, an almost 50% improvement compared to R 2 's of 47% to 57% for the GARCH specifications. This is a remarkably high level of predictability. At hourly horizons, R 2 , are more than 10% higher (e.g., R 2 's from 56%-60% to 66%). All of the SV models provide broadly similar fits, indicating that differences in log-likelihoods are largely due to tail fits. We also benchmark to the RV-based long-memory autoregressive (AR-RV) model of Andersen et al. (2003), and the realized GARCH model of Hansen, Huang, and Shek (2012). These competitors are computed only at the daily horizon, following the literature. Our SV models generate higher R 2 's in every case, and the SV models' MAE and bias are generally similar or lower. The RV-based models clearly outperform the basic GARCH models.
To attach statistical significance, we run bivariate "horserace" regressions, where RV s,τ is from a competitor model and RV SVCJ s,τ is from the SVCJ 2 model. Table 6 summarizes the results. Hourly, SVCJ 2 forecasts are highly significant (t-statistics greater than 50) in every case, and the competitors are insignificant in every case. The SVCJ 2 coefficients are close to but slightly less than one, and GARCH coefficients are near zero. Daily SVCJ 2 forecasts are also highly significant in every case, with t-statistics ranging from 12 to almost 30. Interestingly, competitor forecasts are significant in many cases, though less so than the SVCJ 2 forecasts. Economically, b 2 estimates are close to one and those for the competitor models are close to zero. There is some incremental information in some of the other models, as they are significant in a number of cases, which suggests there is additional predictability to be harvested. It would be interesting to consider an SV model that treats lagged RV as a "regressor" variable, in a manner similar to the realized GARCH model. Table 6. Bivariate "horse-race" regressions for realized volatility using the model in Equation (3) Hourly Daily NOTE: b and t represent the estimated regression coefficients and corresponding t statistics. AR-RV and RealGARCH models are estimated using daily data. Other models are estimated using 5-min data.
Overall, the results provide additional confirmation to Hansen and Lunde's (2005) important article, which finds that it is possible to outperform simple GARCH(1,1) models. Parametric SV models provide strong improvements in forecasting ability, even in challenging periods of time.

Risk Management
Quantitative risk management requires models to accurately fit distributional tails to assess the risks of extreme losses. Regulators often mandate VaR-based procedures, essentially real-time tail forecasts (see, e.g., Duffie and Pan 1997). VaR is the loss in value that is exceeded with probability p, essentially the "100 − pth%" critical value of the predictive return distribution. Financial institutions compute VaR at daily or lower frequencies, but intraday measures are useful for market makers, high-frequency trading, and options traders. To gain intuition, Figure 7 plots realized daily returns and the 1% and 5% daily VaR for the SVCJ 2 model. VaR ranges from a low of well less than 1% to a high of almost 20% during the crisis, with few noticeable violations.
To evaluate performance out-of-sample, Table 7 reports 5min, 1-hr, and daily tail coverage probabilities at the 1%, 5%, and 10% levels, as well as a measure of total fit, D, which compares the ordered predictive quantiles of the model with those observed: D = S −1 S s=1 |Û (s) − U s |, whereÛ (s) are the ordered values ofÛ s = Pr(Y s,τ ≤ y s,τ ), the predictive quantiles for the return at period s and horizon τ , and U s = (s − 0.5)/S are the quantiles of a U (0, 1) distribution.
The SV models generate more stable (across metrics and horizons) and generally more accurate VaR forecasts and Figure 7. Daily returns and out-of-sample 1% and 5% Value-at-Risk (VaR) for daily returns for the SVCJ 2 model. Table 7. Out-of-sample lower-tail coverage probabilities (1%, 5%, and 10%) and distance metrics (D) for 5-min, hourly and daily returns

5-Min
Hourly Daily distributional fits than the competitors, with the SVCJ 2 model performing marginally the best. Occasionally, a competitor model performs better at one frequency and for some quantiles, but no model uniformly dominates the SV models. For example, the EGARCH-t model has the best 5-min performance, but has the worst daily performance and performs poorly in volatility forecasting. In terms of non-GARCH competitors, the AR-RV models, due to a lack of return distribution, cannot be used for VaR calculations. The RealGARCH models do not provide intraday forecasts. The daily RealGARCH VaR statistics are generally on par or slightly worse than the best performing SV models-slightly worse at 1% level, better at the 5% level, worse at the 10% level, and worse in terms of overall fit. Overall, the multiscale SV models provide a robust and stable fit to the tails of the return distribution over all horizons, which indicates their potential usefulness for VaR-based risk management.

Volatility Trading
Volatility forecasts are useful for a range of practical applications, as mentioned earlier. Documenting the economic benefits of a volatility forecasting method is quite difficult, as most applications require additional assumptions. For example, portfolio applications require expected return estimates and investor preferences, both of which are difficult to specify. This generates a joint specification problem: if, for example, a trading strategy does not work well, is it due to the volatility forecasts or the other components of the problem? Because of this, few papers analyze truly out-of-sample portfolio problems (see Johannes, Korteweg, and Polson 2014, for a review).
To highlight the economic value of our models while avoiding these complexities, we implement a mean-reverting trading rule using the VIX index and an ETF, the VXX. The VIX is a widely used index of option implied volatility, which is the value of  the volatility parameter in the Black-Scholes formula equating model and market option prices. Intuitively, implied volatility and the VIX index provides a market determined measure of current and future volatility. Like volatility itself, the VIX index is not directly tradeable, so we base our trading strategy on an ETF, the VXX, which is linked to futures on the VIX index. Our simple trading strategy is based on volatility extremes and compares volatility forecasts from various models with the VIX index. For each model, we compute 5% and 95% predictive bands for RV, either analytically (AR-RV and RealGARCH) or via simulation (for the intraday models). If the VIX index is higher or lower than the 95% or 5% bands, respectively, we enter into mean-reversion trades in the VXX, an ETF inversely linked to the VIX index. For example, if the VIX index is above the 95% band, we buy (or go long) the VXX ETF, which in turn, will increase in value if the VIX decreases. If VIX crosses the median forecast (which changes dramatically over time), we close the position. It is important to note that the procedure is fully out-of-sample and applied symmetrically to all models. Additionally, this trading application (1) uses a simple trading rule, (2) depends crucially on volatility forecasts, and (3) allows for a direct relative comparison of different models.
Intuitively, the strategy tries to identify periods when option implied information (the VIX) contrasts with model-based volatility forecasts. If the trading strategy is profitable, it implies that there is model-based information that can be used to predict future movements in the VIX index. Our sample period is particularly interesting since it is likely that any market inefficiencies or predictability might be magnified and thus mean-reversion trades are a natural strategy to consider. Other research, for example, Nagel (2012), has documented the value of simple mean-reversion trades during the crisis, suggestive of strong liquidity premia or over-reaction. Table 8 reports trade summaries for each model. Performance metrics include the average return, the volatility, and the Sharpe ratio (average returns divided by return volatility), which provides a risk-adjusted performance measure. Note first there are many more long VXX trades (i.e., short the VIX index), which is due to the asymmetries in volatility-volatility tends to spike higher and mean-revert rapidly. The only exception is the EGARCH model, which is strongly biased (see Table  5) and generates poor results. The other models generate positive annualized Sharpe ratios, indicative of predictive ability, but the multiscale SV models have higher Sharpe ratios than all competitor models. We also compute returns to a strategy that buys (or goes long) the trading returns from the SVCJ 2 model and sells (or goes short) the returns from another model. This strategy essentially removes coincident trades and focuses on trades where the models disagree. The Sharpe ratios for the long/short portfolio are always positive when short the competitor GARCH/RV models and often on par with the Sharpe ratio for the SVCJ 2 model. The Sharpe ratios are close to zero for portfolios that are short the other SV models, thus there is little predictive value between these models and the SVCJ 2 model. Overall, this provides additional evidence for the practical utility of our modeling approach.

CONCLUSIONS
This article develops multifactor SV models of 24-hr intraday equity index returns during and after the recent financial crisis. We estimate the models directly using MCMC methods and use particle filtering methods for forecasting and model evaluation. These models, more general than any in the literature, provide a significant improvement in-sample and out-of-sample fits, using both statistical metrics and applications.
In terms of model properties, we find strong evidence for multiscale volatility, outliers (jumps or t-errors), periodic components capturing intraday predictability, and announcements. Importantly, based on predictive likelihoods, we find the exact same ordering of models in-and out-of-sample, which indicates the results are robust and stable, even during the extreme volatility realized in the crisis. Out-of-sample, we find additional support for our approach based on superior volatility forecasts, VaR risk management, which captures tail prediction, and a volatility trading strategy. These results document the practical usefulness of sophisticated SV models for modeling intraday returns.

APPENDIX B: AUXILIARY MIXTURE MODEL
We update the SV states and parameters using the mixture approximation of Omori et al. (2007). Conditional on μ, J t , Z y t , λ t , we transform the returns to (y * t , d t ), where t , and const = 0.0001 is used to avoid logs of zeros. We then write the return equation as y * t = h t + log(ε 2 t ), and approximate the joint distribution of ζ t = log(ε 2 t ) and η t,2 by a mixture of 10 normals: × N (η t,2 |d t ρ(a * j + b * j ζ t ), 1 − ρ 2 ), where (p j , m j , v j , a * j , b * j ), j = 1, . . . , 10 are constants specified in Omori et al. (2007). We then introduce a set of mixture indicator variables ω t ∈ {1, . . . , 10} for t = 1, . . . , T . Conditional on the indicators, the model has a linear Gaussian state-space form, and the FFBS algorithm is used to generate the volatility states and parameters.
Here p(y|τ 2 ) is computed using the Kalman filter. If the draw is accepted, set τ 2(i+1) = τ 2( * ) and generate x (i+1) ∼ p(x|τ 2(i+1) , y) using the FFBS algorithm. Otherwise leave x unchanged. Since x k = (g k ,ġ k ), draws of the function g = (g 1 , . . . , g K ) are obtained directly from x. The degrees of freedom for the fit is obtained by noting that the posterior mean of the function, conditional on τ 2 , has the form E(g|y, τ 2 ) = Ay, where A is the so-called "hat-matrix." The degrees of freedom is defined as d = tr(A). Following Ansley and Kohn (1987), this value is computed efficiently using a modified Kalman filter algorithm.

APPENDIX D: MCMC ALGORITHM
The joint posterior distribution for the model in Appendix A is p (x, λ, J, Z, β, α, θ|y) ∝ p(y|x, λ, J, Z, β, α, θ) × The models were estimated using the Markov chain Monte Carlo algorithm described below. We ran the MCMC for 12,500 iterations and discarded the first 2500 as burn-in, leaving 10,000 samples for posterior inference. For the SVCJ 2 model, we ran the chain for 1,000,000 iterations after a burn-in of 25,000 iterations, and retained every 100th draw, leaving 10,000 samples for inference. Diagnostic plots and tests indicated no obvious problems with convergence. The starting values were set to the prior mean or mode, although we found that the results were robust to this choice. The MCMC algorithm followed by a description of the full conditional posterior distributions are given below.

APPENDIX F: FORECASTING RETURNS AND REALIZED VOLATILITY
Conditional on posterior samples of the states at time s, z (i) s ∼ p(z s |y s ), i = 1, . . . , N, and fixed parameter values, we approximate the forecast distribution of returns and realized volatility by forward simulation. We forecast over a τ -period horizon as follows. For i = 1, . . . , N and t = 1, . . . , τ we generate z (i) s+t ∼ p(z s+t |z (i) s+t−1 ) and y (i) s+t ∼ p(y s+t |z (i) s+t ). We then aggregate the simulated returns and squared returns to obtain samples of returns and realized volatility at The empirical 1%, 5%, 10% quantiles of the return distribution are used to estimate Value-at-Risk. The point prediction for RV is obtained as the posterior mean (across simulations): s,τ .

APPENDIX G: GARCH AND AR-RV MODELS
• Intraday GARCH Models with Seasonality. Let y t denote the 5min return, s t denote the seasonal effect, and σ t the unobserved volatility at period t. Our intraday GARCH models assume one of the following return equations (either normal or t): Normal : y t = s t σ t z t z t ∼ N (0, 1) Student−t : y t = ν − 2 ν s t σ t z t , z t ∼ t ν (0, 1).
The model is estimated in two stages. for Student-t errors. We then use the adjusted returns to estimate the GARCH models above using maximum likelihood methods. • Daily AR-RV Models (Andersen et al. 2003). Let x t denote the daily realized variance. Following Andersen et al. (2003), we considered a number of fractional (long-memory) ARMA(p, q) models for x t and log x t for different values of p and q. BIC identified the best models as a fractional AR(2) model for x t , and a fractional AR(1) for log x t . The models are (1 − φ 1 B − φ 2 B 2 )(1 − B) d x t = α + ε t , ε t ∼ N (0, 1) (1 − φ 1 B)(1 − B) d log(x t ) = α + ε t , ε t ∼ N (0, 1).

SUPPLEMENTARY MATERIALS
The online supplementary materials contain prior sensitivity and MCMC convergence results: two-factor stochastic volatility models. [Received November 2012. Revised May 2014