Dynamic Discrete Mixtures for High-Frequency Prices

Abstract The tick structure of the financial markets entails discreteness of stock price changes. Based on this empirical evidence, we develop a multivariate model for discrete price changes featuring a mechanism to account for the large share of zero returns at high frequency. We assume that the observed price changes are independent conditional on the realization of two hidden Markov chains determining the dynamics and the distribution of the multivariate time series at hand. We study the properties of the model, which is a dynamic mixture of zero-inflated Skellam distributions. We develop an expectation-maximization algorithm with closed-form M-step that allows us to estimate the model by maximum likelihood. In the empirical application, we study the joint distribution of the price changes of a number of assets traded on NYSE. Particular focus is dedicated to the assessment of the quality of univariate and multivariate density forecasts, and of the precision of the predictions of moments like volatility and correlations. Finally, we look at the predictability of price staleness and its determinants in relation to the trading activity on the financial markets.


Introduction
The last 20 years have witnessed a boost in the intradaily trading activity on the financial markets and, subsequently, an enormous increase in the availability of stock prices observed at high frequency. On the one hand, the availability of stock prices sampled at high frequency has steered the empirical analysis of financial markets toward the use of ex-post measurements of (integrated) variance over fixed horizons (e.g., day), see the discussion in Andersen, Bollerslev, and Diebold (2010). On the other hand, prices sampled at high-frequency display a number of microstructural features that challenge the adequacy of the standard specifications for the dynamics of stock prices. This opens the door to alternative model specifications for the highfrequency price moves.
We contribute to this strand of literature with a new statistical framework for the analysis of high-frequency prices. In the classic framework, the prices of financial assets are typically assumed to originate from a continuous distribution with time-varying parameters, for example, with stochastic volatility, see Shephard (2005) among many others. The widely adopted assumption of a continuous underlying price process is made to increase model tractability. However, financial markets regulations make stock price changes intrinsically discrete due to the minimum allowed tick size (also known as decimalization effect). As the sampling frequency increases (e.g., at the frequency of few seconds) price discreteness becomes the dominating feature, see the recent discussion in Rossi and Santucci de Magistris (2018). The statistical analysis of discrete processes in Z poses substantial difficulties from a CONTACT Paolo Santucci de Magistris sdemagistris@luiss.it Department of Economics and Finance, LUISS "Guido Carli" University. Viale Romania 32, 00197 Roma, Italy. CREATES, Aarhus University, Fuglesangs Allé 4, 8210 Aarhus V, Denmark.
Supplementary materials for this article are available online. Please go to www.tandfonline.com/UBES. methodological viewpoint, greatly complicating the underlying theory and model interpretation, see the recent contributions of Koopman, Lit, and Lucas (2017) for a discrete-time model and Shephard and Yang (2017) for a model built in continuoustime. Along with their intrinsic discreteness, high-frequency prices display a number of stylized facts, such as time-varying volatility and correlations, large and persistent share of zero variations (zeros), a feature known as price staleness (see Bandi et al. 2020a;Bandi, Pirino, and Renò 2017), and occurrence of extreme realizations (fattailed distribution).
In this article, we develop a flexible multivariate integervalued model that incorporates the main empirical features of the price changes observed at high frequency. The model builds upon a simple mechanism for the generation of price changes that result from the difference between two unobserved random variables accounting for positive and negative moves. Since the price changes can only take values on a discrete grid, these two random variables must adhere to this constraint. The Skellam distribution of Irwin (1937) and Skellam (1946), which arises from the difference between two independent Poisson random variables, provides the natural baseline framework for discrete price changes, see also Barndorff-Nielsen, Pollard, and Shephard (2012) and Koopman, Lit, and Lucas (2017). In particular, Koopman, Lit, and Lucas (2017) assumed that the price changes of the individual assets traded on NYSE are conditionally distributed as a Skellam with stochastic volatility. The resulting specification belongs to the class of nonlinear non-Gaussian state space models, for which the likelihood is not analytically available. This leads to complicated inference and nonstandard estimation procedures; Koopman, Lit, and Lucas (2017) use simulated maximum likelihood relying on the numerically accelerated importance sampling (NAIS) method of Koopman, Lucas, and Scharth (2015). An extension to the multivariate context within the framework of Koopman, Lit, and Lucas (2017) is unfeasible, since the multivariate Skellam distribution (see Bulla, Chesneau, and Kachour 2015;Akpoue and Angers 2017) is remarkably difficult to handle. 1 Differently from the previous studies, our modeling framework builds upon the idea that the observed price changes are independent conditional on the realization of unobserved discrete-valued random variables characterizing the dynamic properties of the multivariate series at hand. In other words, the model belongs to the class of hidden Markov models (HMM), see among others Vermunt, Langeheine, and Böckenholt (1999), Bartolucci and Farcomeni (2009), Bartolucci, Farcomeni, and Pennoni (2012), and Zucchini, MacDonald, and Langrock (2017). Conditional on the latent Markovian structure, each individual asset is zero-inflated Skellam distributed and independent of other assets. The HMM structure is made up of two independent Markov chains. One Markov chain is responsible for the dynamics of the price changes and their mutual association through a hierarchical structure involving a latent Categorical random variable. The other Markov chain accounts for the time-varying probability of price staleness across assets.
We investigate the probabilistic properties of the model. After marginalization of the latent variables, the distribution of the observables is a multivariate mixture featuring the stylized facts outlined above: time-varying volatilities and correlations, discreteness, fat tails, as well as time-varying probability of zeros in excess of that implied by the baseline Skellam distribution. We show that the model has an alternative representation in terms of a single hidden Markov chain. This allows us to prove the identification of the model as well as to derive an expectationmaximization (EM) algorithm with steps available in closed form. Through the EM algorithm, it is possible to resort to maximum likelihood (ML) estimation with no exceptional effort. We also derive the predictive, filtered, and smoothed distributions of the latent variables, as well as the joint predictive distribution of price changes.
Our empirical results can be summarized as follows: the proposed modeling framework is sufficiently flexible to match the univariate and multivariate empirical distributions of highfrequency price changes and their associated moments, even when the stocks under investigation display heterogeneous tick sizes. This holds true in both low and high volatility periods, the latter being characterized by abnormal price variations especially at the opening and closing of the trading day. The model well accounts for all the empirical features displayed by the high-frequency prices, especially the large and timevarying proportion of zeros, which often simultaneously occur on multiple assets. Finally, the model features the decomposition 1 A notable application of the Skellam in the multivariate context is provided in Koopman et al. (2018), who adopt a copula specification coupled with generalized autoregressive score (GAS) dynamics. To guarantee model tractability, Koopman et al. (2018) imposed equicorrelation (see Engle and Kelly 2012), but this comes at the cost of a very restrictive dependence structure.
of the probability of zeros into three determinants, that can be linked to the trading activity on financial markets: absence of news, microstructural frictions, and offsetting demand and supply. We show that these components have distinctive roles in explaining different dimensions of illiquidity, and we employ them to predict the absence of trading volume on financial markets. The article is organized as follows: Section 2 presents the model and its properties. Section 3 discusses ML inference via the EM algorithm. Section 4 outlines the empirical application, and Section 5 provides an analysis of the decomposition of staleness based on the model outcomes. Section 6 concludes. In addition, a document with supplementary material reports additional results concerning the empirical application.

The Model
Let Y n,t ∈ Z be the random variable representing the price change of asset n = 1, 2, . . . , N at time t = 1, 2, . . . , T, and let y n,t be its realization. We collect the price changes of N assets in the N × 1 vector Y t = (Y n,t , n = 1, . . . , N) ∈ Z N , with analogous notation for y t = (y n,t , n = 1, . . . , N) . We assume that Y t has the following stochastic representation where is the Hadamard product. The properties of Y t are determined by the interaction of two unobserved random com- n,t;Z t , n = 1, . . . , N) ∈ [0, 1, 2, . . .] N and X (2) t;Z t = (X (2) n;t;Z t , n = 1, . . . , N) ∈ [0, 1, 2, . . .] N are N × 1 vectors of random variables associated with positive and negative discrete price moves, respectively. Both X (1) t;Z t and X (2) t;Z t depend on the unobserved integer-valued random variable, Z t . In turn, Z t depends on a homogeneous first-order Markov chain, S ω t , following a hierarchical structure as specified below. We assume that, conditional on Z t , both X (1) n,t;Z t |Z t and X (2) n,t;Z t |Z t are iid Poisson distributed random variables for all n = 1, . . . , N and t = 1, . . . , T, with intensities λ (1) n;Z t and λ (2) n;Z t , respectively. Intuitively, X t;Z t and X (2) t;Z t denote the two sides of the order book for the N assets aggregated across traders.
The term B t;S κ t = (B n,t;S κ t , n = 1, . . . , N) ∈ [0, 1] N is a N × 1 collection of Boolean random variables, each responsible to set to zero the corresponding price change at time t. We can interpret B t;S κ t as a component capturing temporary market freezing induced by illiquidity frictions (such as transaction costs), thus significantly contributing to explaining the large fraction of zero returns at high frequency. 2 In particular, B n,t;S κ t |S κ t are assumed to be i.i.d. Bernoulli random variables independent across n and t. For all n = 1, . . . , N, the success probability κ n;S κ t = P(B n,t = 1|S κ t ) depends on the random variable S κ t , which is an unobserved homogeneous first-order Markov chain. In other words, the states of S κ t determine the success probabilities of B n,t;S κ t .
2 We find that B t;S κ t explains on average the 30% percent of the frequency of zeros. In Section 5, we will shed further light on the distinctive role of X The random variables S ω t and S κ t follow independent homogeneous first-order Markov chains with finite state spaces given by {1, . . . , J} and {1, . . . , L}, respectively. The transition probabilities of S ω t and S κ t are P( be the J×J and L×L transition probability matrices of S ω t and S κ t . Under the usual constraints on positiveness and summability of the transition probabilities, we have that γ c i,j > 0, ι c = ι , for c = ω, κ, with ι being a vector of ones of proper dimension. The initial distributions of S ω t and S κ t are indicated by δ ω = (δ ω j , j = 1, . . . , J) and δ κ = (δ κ l , l = 1, . . . , L) , respectively, and they coincide with the limiting distributions of S κ t and S ω t , that is, the two Markov chains are stationary.
The random variable Z t determines a second hidden layer with state space {1, . . . , K}. In particular, we assume that Z t |S ω t is independent of S κ t and it follows a Categorical distribution with P(Z t = k|S ω t = j) = ω j,k . The parameters ω j,k are collected in the J × K matrix , and they are such that ω j,k > 0 for all j = 1, . . . , J and k = 1, . . . , K, and K k=1 ω j,k = 1 for j = 1, . . . , J. In other words, S ω t determines J different compositions of weights (ω j,k ) of the K pairs of Poisson random variables (X (1) n,t;k and X (2) n,t;k ), whose intensities are determined by the realization of Z t . This induces the hierarchical structure between S ω t and Z t . 3 The dependence structure generated by the model in Equation (1) is outlined in Figure 1. The effect of S κ t on Y t is rather straightforward since it directly affects the probability of drawing zeros. For instance, changes in the frequency of price staleness, possibly associated with prolonged periods of absence of trading, can be directly linked to the variations of the Markov chain S κ t . 4 The same ease of interpretation does not apply when looking at the interaction between the chain S ω t and Z t , since their role in determining the serial and contemporaneous dependence of price changes cannot be fully disentangled. However, their hierarchical structure allows for great flexibility with a relatively parsimonious modeling setup. Indeed, conditional on the realization of S ω t ,Z t handles timespecific dependencies across price changes and accommodates departures from the marginal distributions assumed for each series. Next sections present the distribution and the moments of Y t highlighting the distinctive role of S κ t , S ω t and Z t .

Distribution and Identification
t;Z t jointly determine the distribution of the multivariate high-frequency prices moves, through their dependence on S ω t , S κ t and Z t . After marginalization of S ω t , S κ t , 3 A similar hierarchical structure was adopted in Bartolucci and Farcomeni (2015), Geweke and Amisano (2011), Maruotti (2011), and in Maruotti and Rydén (2009)   Z t and B t , the unconditional distribution of Y t is 5 δ ω j δ κ l ω j,k × κ n;l ψ(y n,t ) + (1 − κ n;l )SK y n,t , λ where ψ(y n,t ) = 1 if y n,t = 0 (and ψ(y n,t ) = 0 elsewhere) is a Dirac mass at 0, SK(·) denotes the probability mass function of the Skellam distribution (see Skellam 1946) SK y n,t , λ (1) n;k , λ (2) n;k = e −(λ (1) , and I y n,t (·) is the modified Bessel function of the first kind. The distribution in (2) is a three layer mixture of conditionally independent zero-inflated Skellam distributions. For this reason, we label the model dynamic mixture of Skellam, DyMiSk, henceforth.
If we wish to condition on the past values of Y t , the distribution becomes 5 To avoid an excessively heavy notation, in the following we omit the sub- π ω t|t−s;j π κ t|t−s;l ω j,k × κ n;l ψ(y n,t ) + (1 − κ n;l )SK y n,t , λ (1) n;k , λ (2) n;k , for s > 0, with where π ω t|t−s;j := P(S ω t = j|Y 1:t−s = y 1:t−s ) and π κ t|t−s;l := P(S κ t = l|Y 1:t−s = y 1:t−s ) are the predictive distribution of S ω t and S κ t in states j and l, respectively. The terms [( ω ) s ] ij and [( κ ) s ] il indicate the ijth and ilth elements of the s-th power of the matrices ω and κ , respectively. Finally, α ω t;i = P(S ω t = i, Y 1:t = y 1:t ) and α κ t;i = P(S κ t = i, Y 1:t = y 1:t ) are the forward probabilities delivered by the forward filtering backward smoothing (FFBS) algorithm; more details are provided in Section 3.
Identification is proven under the following classic set of assumptions: (A1) S ω t and S κ t are irreducible, (A2) the rows of are linearly independent, (A3) κ n;l 1 = κ n;l 2 and (λ (1) n;k 2 ) for all n, l 1 = l 2 , and k 1 = k 2 . The following proposition establishes the identification of DyMiSk.
Proposition 2.1 follows from Theorem 1 and Proposition 2 of Gassiat et al. (2016). Specifically, consider an equivalent parameterization in which we let ω,κ = ω ⊗ κ be the transition probability matrix related to the homogeneous stationary firstorder Markov chain S ω,κ where j and l are the indexes of the JL states of S ω,κ t . Under (A1), S ω,κ t is irreducible implying that the rank of ω,κ is full. Furthermore, under assumptions (A2) and (A3), the state densities reported in (4) are distinct. Hence, Proposition 2 and Theorem 1 of Gassiat et al. (2016) hold.

Moments
By the property of the Skellam distribution, all moments of Y t exist and can be recovered by marginalization of the latent variables. Let (a, b; s) = E[Y a t+s Y b t ] be the matrix of cross products at lag s ≥ 0 with generic element ξ n,m (a, b; s) = E[Y a n,t+s Y b m,t ]. If s > 0 and n = m, ξ n,m (a, b; s) is where M SK (p, λ 1 , M P (r, λ 2 ) is the pth noncentral moment of a Skellam distributed random variable with intensities λ 1 and λ 2 , and M P (q, λ) is the qth noncentral moment of a Poisson random variable with intensity λ. Equation (5) highlights the attenuation effect of κ m,l 1 and κ n,l 2 on all moments of Y t . An increase in the probability of price staleness due to illiquidity determines a reduction in the magnitude of the moments of price changes.
In the limiting case with κ m,l 1 = κ n,l 2 = 1, Y t is a degenerate multivariate random variable with all probability mass in zero. Furthermore, the two chains S κ t and S ω t directly affect the moments in Equation (5) through the powers of the transition matrices, ( κ ) s and ( ω ) s , respectively. For example, the sth autoregressive moment of the squared prices variations, which measures the persistence in the volatility of the price changes, is given by n;k ), (7) and for n = m they further simplify to ξ n,n (a, b; n;k ). In this case, the moments do not depend on κ and ω , and they are function of the stationary distributions of S κ t and S ω t . The predictive moments of Y t are computed by replacing δ ω j and δ κ l in Equation (7) with the predictive probabilities reported in Equation (3). In particular, the predictive covariance matrix

Estimation Via the EM Algorithm
We now present the EM algorithm for the computation of the ML estimator of the DyMSk model parameters. The EM algorithm is an extension of the one proposed in Catania and Di Mari (2020) for the estimation of a multivariate hierarchical Markov-switching models. Let's consider a sample of T observations for N price changes collected in the N × T matrix y 1:T = (y 1 , . . . , y T ), and the series of random variables S ω where their (unobserved) realizations are collected in the T × 1 vectors s ω 1:T , s κ 1:T , z 1:T , and in the N × T matrix b 1:T . To exploit the stochastic representation of the Skellam as the difference of two Poisson random variables, 6 consider the N × T matrix X (1) , LN for κ and 2KN for λ (1) and λ (2) . This means that model complexity in terms of number of free parameters is quadratic in J and L, while being linear in the number of variables N and mixture components K. The complete data log-likelihood (CDLL) function of DyMiSk, that is, the log-likelihood for the observed and unobserved random variables, is To derive the properties of the model we assumed that the initial distributions of S ω t and S κ t coincide with the limiting distributions δ ω and δ κ , respectively. 7 Unfortunately, the CDLL cannot be directly maximized due to the presence of latent quantities. The EM algorithm threats these unobserved terms as missing values and proceeds with the maximization of the expected value of the CDLL. To this end, we introduce a number of augmenting variables: The variables u ω t;j , u κ t;l , v ω t;i,j and v κ t;h,l follow from the standard implementation of the algorithm for Markovswitching models, see McLachlan and Peel (2000), whereas the variable z t;j,k (for j = 1, . . . , J, and k = 1, . . . , K) is specific to the DyMisK model and is related to the additional latent variable Z t . The new variables allow us to rewrite the CDLL, as The EM algorithm iterates between the expectation-step (Estep) and maximization-step (M-step). Given a value of the model parameters at iteration m, θ (m) , the E-step consists of the evaluation of the so-called Q function defined as Q(θ, θ (m) where the expectation is taken with respect to the joint distribution of the missing variables conditional to the observed variables using parameter values at iteration m, denoted by E θ (m) [·]. Exploiting the formulation of the CDLL, the Q function can be factorized as  T \ N  1  2  3  4  5  6  7  8  9  1 0   1000  65  284  120  154  223  227  233  264  271  277  2000  109  492  423  377  426  439  452  512  534  554  3000  153  700  726  600  629  650  671  761  798  830  4000  197  907  1029  824  832  862  889 Computations have been performed on an Intel Xeon E5-2680 v2 CPU at 2.8 GHz. In all cases, the algorithm stops when the relative increment on the log-likelihood function is lower than 10 −7 .
n,t |Z t = k, Y 1:T = y 1:T . The E-step involves the computation of these quantities, but the tasks of filtering and smoothing turn out to be rather involved due to the presence of the two unobserved Markov chains and the additional latent variables Z t and B n,t . Therefore, we rely on an equivalent model representation of DyMiSk that makes filtering and smoothing of the latent chains straightforward via the forward filtering backward smoothing (FFBS) algorithm. This representation is obtained by combing S ω t , Z t , and S κ t , in a single first-order Markov chain, S ω,Z,κ t , with state space {1, . . . , JKL} and restrictions implicitly induced by the structure of DyMiSk. 8 In the M-step of the algorithm, the function Q is maximized with respect to the model parameters θ. Solving the Lagrangian associated with this (constrained) optimization leads to the following closed-form expressions δ ω , and λ (2) n;k .
Given an initial guess θ (0) , the algorithm iterates between the E-and the M-steps until the relative increment of the loglikelihood function is below a given threshold (e.g., 10 −7 ). Dempster, Laird, and Rubin (1977) prove that the EM algorithm provides a nondecreasing sequence of log-likelihood values. Thus, the EM algorithm converges to the maximum of the loglikelihood function. We denote the vector of ML coefficients asθ . 9 Through a simulation analysis, we assess the execution time of the EM algorithm for different combinations of T and N. In particular, we consider N ∈ [1, 2, . . . , 10], T ∈ [1000, 2000, . . . , 10, 000], and we set J = 4, K = 4, and L = 3 (48 hidden states). Table 1 reports the execution time of the EM algorithm. The computation time is below 1 hour even for the largest sample size (T = 10000 and N = 10).

Data Description and Summary Statistics
We consider the high-frequency stock price moves of four companies listed on the Dow Jones index (DJIA) in different time periods. The stocks under investigation are the same as in Koopman, Lit, and Lucas (2017): Caterpillar (CAT), Coca Cola (KO), JP Morgan (JPM), and Walmart (WMT). We consider two sampling periods: a low volatility one from November 6, 2013, to November 19, 2013, and a turbulent one (labeled " Lehman"), from September 11, 2008, to September 25, 2008, which includes the bankruptcy of Lehman Brothers. Prices are collected from the Trades and Quotes (TAQ) database and a preliminary cleaning procedure is performed according to Brownlees and Gallo (2006) and Barndorff-Nielsen et al. (2009). Although the DyMiSk can be employed with price changes observed at any sampling frequency, we have decided to focus on stock prices sampled at 15 seconds by means of the previous-tick method. Both sample periods consist of 62, 400 observations (1560 intradaily observations for 10 days and 4 assets). This sample size is approximately three times larger than the one adopted in Koopman, Lit, and Lucas (2017). 10 9 As for standard HMMs, the log-likelihood function of DyMiSk can present several local optima and there is no guarantee that convergence to the global optimum is achieved. To this end, running the algorithm several times with different starting values is a standard procedure to better explore the log-likelihood surface. 10 Koopman, Lit, and Lucas (2017) considered a sample of one year of stock prices sampled at 1 s frequency. However, the parameter estimates are obtained on a day-by-day basis, that is with T = 23, 400 observations. A comparison of the results obtained with other sampling frequencies would be extremely time consuming in our setting and it would add great length to the article. Thus it is left for future research.   Table 2 reports the main summary statistics of the price changes for the four stocks considered. Notably, the median and the mode of Y n,t are zero in both periods. This provides a first evidence on the large share of zeros characterizing stock prices sampled at high frequencies. For instance, during the low volatility period, the percentage of zeros is between 34% for CAT and 50% for KO (the least liquid asset). The percentage of zeros drastically reduces during the Lehman period, as a consequence of the large amount of news arriving to the market and the increased uncertainty about the fundamentals across investors. The sample average of price changes is also very close to zero and, especially for the low volatility period, the level of skewness is almost null. This signals a rather symmetric distribution of price variations. On the contrary, all series are negatively skewed during the Lehman period: this is due to the arrival of several bad news about the overall stability of the financial sector. These generate large negative price moves, resulting in a skewed distribution. Furthermore, both variance and kurtosis are very large, and the magnitude of the price variations is rather extreme as testified by the maximum and minimum variations in the order of hundreds of cents. Notably, the largest price variations in both periods take place at the opening of the trading day. Indeed, a well-known stylized fact of high-frequency prices is that the variability of their changes exhibits a pervasive intradaily seasonal pattern, see among others Andersen and Bollerslev (1997) and the recent contribution of Andersen, Thyrsgaard, and Todorov (2019). For instance, at the opening of the market, the volatility is generally at its peak as a consequence of the rebalancing activity by market participants processing the overnight information. On the contrary, the volatility is typically very low during lunch time. Figure 2 shows that the probability of zeros is also subject to nonnegligible variability at the intradaily level with a reverse U-shape relation reflecting the different amounts of trading activity within the day. This evidence is consistent across the four assets under investigation, with KO being the least active stock with more than half of the trades associated with zeros during the central business hours in the low volatility period.
n;k is slightly modified to λ (1) n;k and λ (2) n;k We also let the Bernoulli probabilities, κ n;l , to depend on a set of deterministic seasonal components, g t,d , where g t,d = 1, if time t coincides with season d, for d = {1, . . . , D 2 }. Specifically, we modify the Bernoulli probabilities as κ n,t;l = D 2 d=1 g t,d κ n,d;l , where κ n,d;l are seasonal-dependent Bernoulli probabilities that need to be estimated alongside the other parameters. The Eand M-steps of all other parameters remain unchanged, while κ n,t;l replaces κ n;l . The M-step for the Bernoulli probabilities To capture the intense trading activity at the opening of the market, the first period coincides with the first 5 min of the trading day from 9:30 to 9:35, the second from 9:35 to 10:00, and the remaining run 30 min each until market closing time.

Model Selection and Goodness of Fit
The DyMiSk is estimated on both the low volatility and the Lehman periods for all combinations of L ∈ {1, . . . , 6}, K ∈ {1, . . . , 15}, and J ∈ {1, . . . , 6}, with K ≥ J for identification. Table 3 reports the computational time (in seconds) for the EM algorithm to converge for different selected combinations of K, J, and L in the low volatility period. The computational time takes few minutes for specifications with K, J < 4 and L < 3, while it is in the order of several hours for richer specifications. The selection of the best model is performed via the Bayesian Information Criteria (BIC). The BIC selects J = 5, K = 5 and 11 Note that, since λ (m+1) k is computed using the previous iteration estimate of β n,t , the resulting algorithm is effectively an Expectation Conditional Maximization (ECM), see Meng and Rubin (1993). L = 1 for the low volatility period, and J = 5, K = 12, and L = 2 for the Lehman period. Interestingly, the variability and erratic nature of the price moves during the Lehman episode requires not only many mixture components (K = 12), but also two states for S κ t . On the contrary, a more parsimonious model is selected for the low volatility period. The estimated parameters are reported in Section 3 of the supplementary document. In the Lehman period, the states of S ω t and S κ t are very persistent since the estimated transition matricesˆ ω andˆ κ are close to identity matrices. As for the matrices λ (1) and λ (2) , they display heterogeneous patterns over the J = 12 rows, with values in the range between 1.4 and 115. This makes the DyMiSk able to adapt to low, medium and high volatile states. Coupled with the different mixing probabilities inˆ , this heterogeneity determines a very flexible correlation structure across assets. We also notice a high degree of heterogeneity in the elements of the vectorsκ 1 andκ 2 , which in some cases take values close to one (thus accounting for episodes of prolonged staleness). As expected, the magnitude of λ (1) and λ (2) is much smaller in the low volatility period than during the financial crisis. Analogously, the magnitude of the Bernoulli probabilities in the vectorκ 1 is reduced compared with the Lehman period. Indeed, the Skellam distribution with small values of λ (1) and λ (2) generates zeros with a larger probability than in the high volatility scenario, so that the role of illiquidity is reduced. Overall, the parameter estimates signal the ability of the DyMiSk to well adapt to changing market conditions. In the following paragraphs, we check the goodness of fit both at the univariate and multivariate level.

Univariate Analysis
The goodness of fit of the univariate marginal distributions can be visually assessed by looking at Figure 3. The fit to the empirical frequencies achieved by the DyMiSk is remarkable, and it signals the ability of the dynamic mixture to adapt to different market conditions and intensities of the trading process. 12 Indeed, the fit proves remarkable in all the intradaily business periods defined according to the seasonal dummies. As expected, the empirical distribution is more dispersed at the opening, that is, from 9:30 until 9:35, thus justifying the use of a specific seasonal term, β 1,d , for this period. Furthermore, during the Lehman episode, the probability mass is more dispersed than in the low volatility period; even during the central hours of the day. We also investigate the quality of the fit of the univariate distributions by means of the test of Berkowitz (2001), which is a tool for assessing the quality of density forecasts for financial risk management applications. The test is based on the probability integral transforms (PITs) of the data with respect to their conditional distribution, which for the DyMiSk is easily computed from the predictive distribution by marginalization. The Berkowitz's test relies upon the previous results by Fisher (1932) and Pearson (1938) stating that, under correct model specification and when the support of the observables is continuous, PITs should be i.i.d. uniformly distributed over the (0, 1) interval, and their transformation through the Gaussian quantile function should be i.i.d. Gaussian distributed. For discrete random variables the PITs cannot be uniformly distributed, and modifications should be made to the testing procedure.
To tackle this issue, we compute the randomized, yet uniform, PITs for integer-valued variables derived by continuization of the discrete conditional pmf, see Smith (1985), Brockwell (2007), and Liesenfeld, Nolte, and Pohlmeier (2008). Figure 4 displays the histogram of the PITs divided in 10 bins for all series. We report results for both the in-sample and the out-of-sample periods around the Lehman episode. 13 The out-of-sample period covers ten trading days after the insample period. Through the out-of-sample analysis, we assess the ability of DyMiSk to adapt to changing market conditions and to capture the relevant features of the high-frequency price changes outside the estimation period. The plots highlight the ability of DyMiSk to provide an overall good fit. Indeed, the PITs are approximately uniformly distributed in all cases since the relative frequencies (blue columns) falls within (or are very close to) the 95% confidence bands (red line), which are very narrow due to the extremely large sample size. Table 4 reports the results of the Berkowitz's testing procedure. The results from Table 4 are mixed and can be summarized as follows: (i) the conditional distribution is generally correctly specified for WMT and CAT in both in-sample periods and only for WMT and JPM in the low volatility out-ofsample period, (ii) during the Lehman out-of-sample period, we always reject the null hypothesis, and (iii) the null hypothesis of independence and correct coverage of the transformed PIT is always rejected. The rejection of the null hypothesis is somehow expected due to the very large sample size and the parameters instability following the Lehman episode. We conclude that,  (2007) computed according to the onE-step ahead univariate conditional distribution of each asset for the Lehman period. PITs are divided into 10 bins such that under the null hypothesis of correct model specification the area of each bin should be 10%. Confidence intervals based on the methodology of Diebold, Gunther, and Tay (1998) are computed at the 5% level.  Berkowitz (2001).

Lehman period
Low volatility period The tests are computed using the randomized PITs as in Brockwell (2007). We consider the coverage of the left tail below the τ % quantile level. Columns labeled " All" correspond to unconditional coverage of the whole distribution (τ = 100%). Columns labeled " Joint" report the statistics associated with the joint test for the null of correct unconditional coverage and independence of the PITs. Gray cells indicate a p-value above 5%, based on the asymptotic distribution of the test.
although the tests reported in Table 4 often reject the null hypothesis, histograms displayed in Figure 4 are encouraging and suggest that the fit of the univariate distributions achieved by DyMiSk is reasonable in both the in-sample and the out-ofsample periods.
As a final assessment of the quality of the univariate fit, we focus on the relevance of incorporating the frictions component, B t , into the DyMiSk. We therefore consider a restricted DyMiSk specification, where the component Figure 5 illustrates the dramatic worsening in the quality of the fit in this case. Indeed, the PITs arising from the restricted model are far from being uniformly distributed, even if the optimal orders J and K of the chain (Z t , S ω t ) are optimally selected by BIC. This finding highlights the crucial role of B t in improving the overall fit of the DyMiSk by contributing to explain the observed staleness in prices.

Predicting Staleness
As shown in Table 2, the unconditional probability of observing zero variations in the dataset of prices observed at 15 seconds frequency is very high and generally well above 30%. This phenomenon is well known in the high-frequency literature, see the recent contributions of Bandi, Pirino, and Renò (2017)   We consider regression (10) with no control variables (a), with seasonal dummies (b), with seasonal dummies and 5 lags of the dependent variable (c), with seasonal dummies, lags of the dependent variable and the bid-ask spread (d). The probability P t|t−1 is computed as n,k ) . The superscripts * * * , * * , and * indicate statistical significance at the 1%, 5%, and 10% significance levels, respectively. The standard errors are computed according to the Newey-West formula (HAC). The pseudo R 2 is the goodness-of-fit index by McFadden (1974McFadden ( ). et al. (2020. We therefore look at the ability of the DyMiSk to predict the occurrence of zeros. Table 5 reports the parameter estimates of a predictive logistic regression over the out-ofsample period. The logistic function is specified as where x t = 1, W n,t , P t|t−1 (Y n,t = 0) , W n,t is a vector of control variables, and P t|t−1 (Y n,t = 0) denotes the model-based predictive probability of zeros at time t conditional on the information set at time t − 1. All figures in Table 5 signal the positive and highly significant dependence between the ex-ante (model-based) probability of zeros and the ex-post realization of price staleness, even when adding control variables in the model. These are intradaily seasonal dummies, lags of the dependent variable, and a proxy of the liquidity of the market as measured by the bid-ask spread (BA). The point estimates of the parameter on P t|t−1 (Y n,t = 0) are such that the associated average partial effect is between 0.9% and 1.3%. This means that a one percent increase in the conditional probability of staleness increases the odds of observing a zero relative to a non-zero variation by roughly 1%. The pseudo R 2 of McFadden (1974) indicates a relatively low predictive ability of the logistic regression model, as it lies between 2% and 4% even when covariates/control variables are included in the logistic regression model. On the one hand, this signals the low predictability of staleness as consequence of the erratic nature of the high-frequency price changes. On the other hand, this finding suggests that the (model-based) probability of zeros incorporates all the relevant information available at time t − 1 needed to predict price staleness at time t.

Bivariate Analysis
The goodness of fit of the bivariate distribution of CAT-WMT and CAT-JPM for different intradaily periods (opening, lunch, closing) is reported in Figure 6. The fit to the empirical frequencies (red area) by the DyMiSk (blue line) is again remarkable for both the low volatility and the Lehman periods. For what concerns the low volatility period, Panel (a) highlights that the bivariate distribution of the price variations is rather sparse at the opening, while around lunch and at closing most of the probability mass is in the range between −1 and +1 cents, with a relatively high percentage of joint zero variations. The picture drastically changes in the Lehman period. The bivariate empirical probability is dispersed in all intradaily periods (including lunch and closing hour). The double hidden Markov structure generated by S κ t and (S ω t ; Z t ) allows for a very flexible characterization of the multivariate distribution of the highfrequency price moves. For instance, the probability mass on zero is high at the opening for CAT-JPM (while not for CAT-WMT). This evidence is associated with an episode of trading halt affecting several stocks traded on NYSE. Indeed, at the opening of Monday, September 15, 2008, the trading of CAT, KO, JPM stopped, resulting in a frozen market and a prolonged period of no price variations. This might be considered a systematic event of staleness (or co-staleness, see Bandi, Pirino, and Renò 2020b). In particular, Panel (j) of Figure 6 displays the effect of the market freezing on the joint probability of zeros on CAT and JPM. The fit of DyMiSk to the bivariate empirical distribution is remarkable even in these extreme situations.

Filtered Estimates and Variance Prediction
The FFBS algorithm can be exploited to extrapolate the conditional intradaily volatilities of each individual stock under consideration. Figure 7 displays the absolute value of the price changes together with the extrapolated volatilities, denoted as σ n,t|t−1 . The latter are computed as the square root of the diagonal elements of the predicted covariance matrix, t|t−1 , as reported in Equation (8). The intradaily patterns in the magnitude of the price variations are clearly reflected in the extrapolated volatilities, which are, by construction, smoother than the ex-post realizations. 14 While Figure 7 provides a visual illustration of the dynamics of the filtered volatilities, we also  statistically assess the performance of the DyMiSk in providing precise forecasts of the variance of the price changes. As documented by Patton (2011), comparing the square root of the variance forecast with absolute returns generally leads to biased conclusions. Specifically, the comparison tends to favor models with downward-biased variance forecasts. Therefore, we follow Patton (2011) and use the mean squared error (MSE) and quasi likelihood (QLIKE) to compare different variance predictions. QLIKE and MSE provide correct ranking of competing volatility forecasts. In the analysis, we use the same strategy of Creal, Koopman, and Lucas (2011) and construct six portfolios, p j,t = g j y t , for given N ×1 vectors g j and for j = 1, . . . , 6. Stocks are ordered as WMT, KO, JPM, and CAT, and portfolio weights are set to: g 1 = (0.25, 0.25, 0.25, 0.25) , g 2 = (0.4, 0.2, 0.3, 0.1) , g 3 = (0.5, 0.5, 0.5, −0.5) , g 4 = (0.5, −0.5, 0.5, 0.5) , g 5 = (0.5, 0.5, −0.5, −0.5) , and g 6 = (0.5, −0.5, 0.5, −0.5) as in Creal, Koopman, and Lucas (2011). Since the variance is not observed, we proxy it by the squared price variation of the jth portfolios, that is, σ 2 t+1;j = p 2 j,t . The QLIKE and MSE for the jth portfolio are defined as QLIKE t+1|t;j = log(σ 2 t+1|t;j ) + σ 2 t+1;j /σ 2 t+1|t;j , and MSE t+1|t;j = (σ 2 t+1|t;j − σ 2 t+1;j ) 2 , wherê σ 2 t+1|t = g j t+1|t g j is the prediction of the portfolio variance obtained with DiMiSk.
As a benchmark, we consider the DCC model with periodic GARCH components (DCC-PGARCH). Periodicity is imposed on the univariate GARCH terms via a seasonal intercept as in Rossi and Fantazzini (2015). Specifically, the benchmark model assumes Y t |Y 1: The conditional correlation matrix, R t , follows a DCC specification as in Engle (2002). The DCC-PGARCH is estimated on the intradaily price variations via twostep quasi-maximum likelihood (QML), see Engle (2002). Given the parameter estimates, the one-step ahead prediction H t+1|t is available in closed form, and the predicted portfolio variance is given byσ 2 t+1|t = g j H t+1|t g j . Thanks to its flexible structure, the DCC-PGARCH represents a challenging benchmark model for the assessment of the quality of volatility predictions. The comparison involves the one-step-ahead volatility predictions for both the low volatility and the Lehman periods. Values smaller than one indicate outperformance of DyMiSk with respect to DCC-PGARCH and vice versa. Green (Red) cells indicate rejection of the bilateral null hypothesis of equal predictive ability of Diebold, Gunther, and Tay (1998) at the 5% confidence level and outperformance (underperformance) of DyMiSk with respect to DCC-PGARCH. Table 6 presents a summary of the forecasting accuracy of the DyMiSk relative to that of the DCC-PGARCH. The comparison of the predictive accuracy of the DyMiSk and DCC-PGARCH is performed through the Diebold and Mariano (1995) test. The forecasting window includes the 10 days after the in-sample interval for both the low volatility and the Lehman periods. In many cases (51 out of 96), the DyMiSk provides out-of-sample variance predictions that are statistically superior to those of the DCC-PGARCH at the 5% significance level, while in 26 out of 96 cases, the DCC-PGARCH proves superior to DyMiSk. At the opening of the trading day, MSE and QLIKE disagree in the low volatility period, while they signal underperformance of DyMiSk during the Lehman period. Overall, gains with respect to DCC-PGARCH are more sizable during the low volatility period, and during lunch and closing times, while during the Lehman period the performance of the DyMisk slightly deteriorates (especially at the opening), perhaps as a consequence of the occurrence of extreme realizations. Overall, Table 6 testifies the ability of the DyMiSk to provide a very flexible prediction of volatilities and correlations, which well adapts to changing market conditions.

Heterogenous Tick Size
In this section, we further explore the features of the DyMiSk in an empirical context characterized by a relatively large number of assets (N = 10), with different tick sizes relative to the trading price. In particular, we consider 10 assets belonging to the DJI30 index: General electric (GE), American Express (AXP), Pfizer (PFE), Bank of America (BA), The Travelers Companies (TRV), United Technologies Corporation (UTX), The Coca-Cola Company (KO), Goldman Sachs (GS), IBM, and Apple (AAPL). We select them by sorting all DJI30 assets according to their average price on April 3, 2009. The first three assets (GE, AXP, and PFE) are those with the lowest trading price (≈ 10$), that is with the highest tick size relative to the price. The next four stocks (BA, TRV, UTX, and KO) are those with intermediate tick sizes, that is, with trading prices around 30$. Finally, the last three stocks (GS, IBM, and AAPL) are those with the largest average prices around 90$. They display the lowest tick size relative to the price. We consider a sample period ranging from March 2, 2009to March 11, 2009, for a total of 12472 observations. Figure 8 provides an illustration of the heterogeneous behavior of the price changes of high-tick (PFE) and low-tick (IBM) stocks on a given trading day (March 9, 2009). In high-tick stocks like PFE, most of the price moves are in the range −2 and +2 cents, which is roughly associated with a 0.2% variation in the stock price. On the contrary, for IBM the price moves cover a wider range of values between −10 and 10 cents.
We estimate the DyMisk model for all pairs of (J, K, L) with J ∈ (1, . . . , 12), K ∈ (1, . . . , 15), and L ∈ (1, . . . , 6) with the same seasonal specification described in Section 4.2. The BIC selects the DyMisk model with J = 4, K = 10, and L = 1. As an illustration of the goodness of fit, Figure 9 reports the PITs of AXP and IBM. The fit of the univariate empirical distributions is remarkable irrespectively of the relative tick size of the stocks under investigation.
The outstanding quality of the fit carries over when considering the bivariate empirical distribution of low-tick (IBM) and high-tick (PFE) stocks, as reported in Figure 10. We also formally assess the goodness-of-fit of the univariate distributions by the LR test statistics of Berkowitz (2001), as reported in Table 7. In almost all cases, the LR test statistics are low and we cannot reject the null hypothesis of good coverage of the empirical probabilities. To conclude, the empirical evidence reported in this section further highlights the flexibility of the DyMiSk and its ability to adapt to the distributional features of the stock prices at hand, irrespectively of their relative tick-size.    The tests are computed using the randomized PITs as in Brockwell (2007). We consider the coverage of the left tail below the τ % quantile level. Columns labeled " All"correspond to unconditional coverage of the whole distribution (τ = 100%). Columns labeled " Joint" report the statistics associated with the joint test for the null of correct unconditional coverage and independence of the PITs. Gray cells indicate a p-value above 5%, based on the asymptotic distribution of the test.

Disentangling Staleness
Price staleness and irregular trading have been studied in several articles, see the early works of Atchison, Butler, and Simonds (1987) and Lo and MacKinlay (1990), and the more recent contributions of Renò (2017, 2020b). A common trait of most studies on high-frequency market imperfections is the assumption of a continuous underlying price process with microstructural features modeled as an additional source of randomness (like a censoring or a barrier) preventing the efficient price to be observed, see also the recent studies of Kolokolov, Livieri, and Pirino (2020) and Bandi et al. (2020a).
In this section, we look at the ability of the DyMiSk to shed some light on the different sources of price staleness. Intuitively, the large fraction of zeros in the high-frequency prices could be due to several factors. First, stale prices might be the consequence of frictions in the form of bid-ask spread, which are partly responsible for the observed sluggishness of the highfrequency prices. Second, the absence of price variations might be the consequence of the absence of news. In the absence of news, traders do not revise their reservation price and do not generate any trade and price movement. Third, even in the presence of news, if the aggregated traders' reactions to the news are of opposite sign but of same magnitude, then the observed transaction price remains constant. In this case, we say that the market is in a dyadic state. Hence, by conditioning on different realizations of the latent variables of the DyMiSk, we can separately identify the three sources of zero variation in the observed high-frequency transaction price. In particular, the DyMiSk allows us to disentangle the probability of zeros as • No news: P Y n,t = 0|B n,t = 0, X (1) n,t = 0; Y 1:t−1 ; • Dyadic market: P Y n,t = 0|B n,t = 0, X (1) The probability of price staleness generated by frictions is completely determined by the success probability of the Bernoulli random variable, B n,t , that is, P(B n,t = 1|Y 1:t−1 ). This provides a structural interpretation of the excess probability of zeros, as a component not related with the processing of new information on the market, but rather with the presence of transaction costs and frictions. The last columns of Table 8 report the proportion of zeros attributed to the three sources obtained averaging at the daily horizon. The share of zeros attributed to frictions is larger during the Lehman period than in the low volatility period, where the proportion of zeros associated to no-news is larger on average across stocks. A nonnegligible share of zeros (around 30% on average) is attributed by the model to the dyadic state. Peculiar heterogeneous patterns at the intradaily level emerge from the other columns of Table 8. For instance, frictions account for a large proportion of zeros at the opening and closing of the trading day during Lehman, that is, when the bulk of information to be processed is large and the distribution of price moves is more dispersed. Instead, the dyadic state is the relevant source of zeros at the opening during the low volatility period (above 66%). Finally, the no-news component is relevant at lunch and at closing (above 55%) in the low volatility period.

Staleness and Trading Activity
In the following, we look at the relation between trading activity and different sources of price staleness, and we build upon the mixture of distribution hypothesis (MDH) of Clark (1973) and Tauchen and Pitts (1983) to provide an ideal and simple setup The table reports the share of zeros attributed by DyMiSk to no news, dyadic market, and frictions for all series during the Lehman and low volatility periods. Results are reported in percentage (relative to the fraction of zeros in each period-that is, the sum by row is 100%) and are computed at the opening, lunch, and closing times. The decomposition at the daily horizon is also reported in the " Daily" panel. The table reports the results for each asset over the low volatility and the Lehman periods. We consider regression (11) with no control variables (a), with seasonal dummies (b), with seasonal dummies and 5 lags of the dependent variable (c), with seasonal dummies, lags of the dependent variable and the bid-ask spread (d). The superscript * * * , * * , and * indicate statistical significance at the 1%, 5%, and 10% significance levels, respectively. The standard errors are computed according to the Newey-West formula (HAC). The pseudo R 2 is the goodness-of-fit index by McFadden (1974).
to interpret the empirical findings. We relate the absence of price movements to the volume of trades by assuming that the market consists of a finite number of active traders, who take long or short positions on a given asset. The evolution of the equilibrium price is motivated by the arrival of new information to the market. As new information arrives, the traders adjust their reservation prices, resulting in a change in the market price given by the average of the increments of the reservation prices. The reservation price of each trader might reflect individual preferences, asymmetries in information sets, and/or different expectations about the fundamental values. In the absence of news, individual traders do not update their reservation prices and no trading volume is generated. Moreover, due to the presence of microstructural frictions, such as transaction costs in the form of bid-ask spread, the trader does not trade if the difference between her/his reservation price and the equilibrium price is too small in absolute value. Hence, we do not record any price variation nor exchange of stocks in this case. As noted by Bandi et al. (2020a), the presence of transaction costs might reduce the amount of traded securities, when the execution costs are excessively large. Finally, if the aggregated reservation prices are of the same magnitude but with opposite signs, then trades take place (i.e., stocks are exchanged), but the equilibrium price does not move. In this last case (namely in dyadic market), we observe price staleness with nonzero trading volume. Summarizing, we expect the absence of trading volume to be associated with price staleness when the latter is generated by the absence of news and frictions. On the other hand, trading volume can be generated without price moves when the market is in a dyadic state.
In Table 9 we test the empirical prediction outlined by means of a logistic regression. We specify the logistic function as where V n,t is the trading volume at time t on asset n, x t = 1, W n,t , P t|t−1 (Y n,t = 0) , W n,t is a vector of control variables such as intradaily seasonal dummies and autoregressive terms, and P t|t−1 (Y n,t = 0) is the (predicted) probability of staleness due to frictions and absence of news, that is defined as P t|t−1 (Y n,t = 0) = P Y n,t = 0|X (1) n,t > 0, X (2) n,t > 0, X (1) n,t = X (2) n,t Frictions + P Y n,t = 0|B n,t = 0, X n,t = 0, X n,t = 0 No News . For both the low volatility and the Lehman periods, the predicted probability of the absence of news and frictions is associated with a significant increase in the probability of observing zero trading volume. Indeed, the parameter loading P t|t−1 (Y n,t = 0) is significant and it positively impacts on the probability of the absence of trading activity. This finding holds even when controlling for autocorrelation, intradaily seasonality, and bid-ask spread. Indeed, repeated trades on the ask or on the bid sides would result in a sequence of zeros associated with nonzero transaction volume. Hence, it is crucial to control for liquidity proxies, as the they might negatively impact on the number of traded securities. Concluding, the estimates reported in Table 9 support the claim that the DyMiSk can be used to disentangle the price staleness of financial prices observed at high frequencies and to predict periods with reduced trading activity, as measured by the absence of trading volume. 15

Conclusions
Building upon the framework of hidden/latent Markov chains, we provide a multivariate hierarchical HMM model for discrete data based on the Skellam distribution. We apply it to the prices of stocks traded on NYSE and observed at high frequency (15 seconds). Our model captures most of the features of the price variations observed at high frequencies both in-sample and outof-sample. Furthermore, it allows us to disclose new characteristics of the market microstructure. For instance, the model is able to account for the large proportion of zeros, which often occur contemporaneously on several assets (co-staleness, see Bandi, Pirino, and Renò 2020b). These events might be associated with frozen market conditions and illiquidity episodes preventing the efficient transmission of news to the financial prices. Furthermore, we study the relationship between the model-implied probability of zeros and the absence of trading volume, and we find it to be in line with the findings of Bandi et al. (2020a).
To conclude, we believe that the DyMiSk can be beneficial for several financial applications not limited to the one presented in this article, for example, when the goal is to investigate illiquidity spillover effects on a large scale. Furthermore, the DyMiSk might represent a suitable modeling framework also in nonfinancial applications involving signal extraction in the presence of rounding errors. For instance, when measuring air pollutants to assess their effect on air quality or when predicting the risk of a given disease based on censored scores.