Variable selection for quantile autoregressive model: Bayesian methods versus classical methods

In this article, we introduce three Bayesian variable selection methods for the quantile autoregressive model with explanatory variables. The Gibbs sampling algorithms are developed for each method by setting different priors. The numerical simulations suggest that the Gibbs sampling algorithms converge fast and Bayesian variable selection methods are reliable. A real example is given to analysis the relationship between the count of total rental bikes and five explanatory variables. Both simulations and data example indicate that the proposed methods are feasible, reliable, and appropriate for analyzing the Bike Sharing data set.


Introduction
Since the pioneering work by Koenker and Bassett [9], the theoretical studies and applications of quantile regression (QR) method have become increasingly popular, especially in the fields of economics [6], social sciences [11], agriculture [12], financial portfolio [18], ecology [19], survival analysis [26], etc. QR belongs to a class of robust models, which can evaluate the covariates effects of different quantiles of the results as a whole, so it can provide a more comprehensive analysis of actual problems [8].
Quantile regression for time series has recently become increasingly popular in statistical modeling.Koenker and Xiao [7,10] pioneered the quantile autoregressive (QAR) models.Cai et al. [3] considered the estimation and prediction for the QAR model via Bayesian approach.Cai et al. [4] developed the novel quantile double autoregressive model for modeling financial time series.Li et al. [14] proposed two important measures, i.e. quantile correlation and quantile partial correlation, and then applied them to the QAR models.Zheng et al. [35] studied the hybrid quantile regression estimation for time series models with conditional heteroscedasticity.More recently, Li et al. [16] developed a conditional quantile estimation for hysteretic autoregressive models.Zhu et al. [36] incorporated additional covariates into the conditional variance equation and studied the linear quantile regression model with GARCH-X errors.
To capture the dynamic relationship usually exhibited in economics, Yang et al. [31] proposed the following quantile autoregressive model with explanatory variables where F t−1 denotes the σ -field generated by {y t−1 , y t−2 , . ..},X t = (x 1,t , x 2,t , . . ., x q,t ) T denotes the vector of explanatory variables, {x j,t } (j = 1, . . ., q) is assumed to be independent with y t−l (l ≥ 1), α 0 is the intercept, α i and β j (j = 1, . . ., q) are the parameters of interest, τ ∈ (0, 1) denotes the quantile level such that This model is named QAR-X model by Yang et al. [31].It plays a significant role in economic modeling.It not only describes the influences of internal and external factors on economic laws, but also can model different quantiles.Therefore, it comes closer to reality than the linear regression models or the autoregressive models.The estimation problem of the QAR-X model has been well addressed by Yang et al. [31].In their paper, they developed a highly efficient Gibbs sampling algorithm for estimating the model parameters.However, how to determine the order in the autoregressive part and how to select the appropriate explanatory variables remain open problems.Especially when the model has a sparse representation, variable selection is particularly important.Identifying important explanatory variables can improve the fitting performance of the model.The main focus of this paper is to solve this problem, which will greatly improve Yang et al. [31]'s result.Currently, variable selection methods can be summarized into two categories: one is based on the penalized likelihood method, and the other is based on the Bayesian method.From a penalized likelihood point of view, Tibshirani [25] pioneered the least absolute shrinkage and selection operator (LASSO) method.Fan and Li [5] proposed the smoothly clipped absolute deviation (SCAD) penalty.Koenker [7] developed the l 1 -regularization quantile regression method to shrink individual effects towards a common value.The adaptive LASSO proposed by Zou [37] extends LASSO to allow different penalty parameters for different regression coefficients.Combining the ideas of least absolute deviation and LASSO robust parameter estimation and selection, Wang et al. [27] proposed the LAR-LASSO method.Wu and Liu [28] discussed the penalized quantile regression with the SCAD and adaptive LASSO penalties.
From a Bayesian point of view, Park and Casella [22] introduced the Laplace prior in Bayesian LASSO.Sun et al. [24] proposed two variable selection methods: the Bayesian adaptive LASSO and the iterative adaptive LASSO, which extend the adaptive LASSO to some degree.Alhamzawi et al. [2] developed the Bayesian adaptive LASSO quantile regression by allowing different penalization parameters for different regression coefficients.Different from the traditional Bayesian variable selection methods, the spike-and-slab prior to performing Bayesian variable selection was first proposed by Mitchell and Beauchamp [20].In such a framework, the prior is composed of two components -the spike component and the slab component.The spike component is a point mass at zero and the slab component is a uniform distribution on a finite interval.Narisetty and He [21] showed the strong selection consistency of the Bayesian model, which is the strongest selection consistency result in the Bayesian variable selection literature.Xi et al. [29] put the spikeand-slab prior on the coefficients and proposed a hierarchical Bayesian model.Recently, Yang et al. [30] studied the Bayesian empirical likelihood estimation and order shrinkage for autoregressive models via the spike-and-slab prior.
In this paper, we develop three different Bayesian variable selection methods for the QAR-X model by introducing different priors.We have developed tractable and efficient Gibbs sampling algorithm for each method.For a given quantile level, the proposed Bayesian variable selection methods can achieve order shrinkage and variable selection simultaneously.Moreover, the deviance information criterion (DIC) criterion proposed by Spiegelhalter et al. [23] is also adopted to select the best fitting model across different quantiles in the real data analysis.The proposed models are applied to the count of total rental bikes (CNT) data to explore the relationship between CNT and multiple exogenous variables.By fitting this data set with the QAR-X model at five different quantiles, we find that there are different significant explanatory variables for different quantile levels.For example, when the weather is cold, the temperature is an important factor that restricts users to choose shared bicycles for travel.However, such new findings are not reflected by an ordinary mean regression model.
The outline of the paper is as follows.In Section 2, we use the location-scale mixture representation to replace an asymmetric Laplace distributed random variable, and write the likelihood for the QAR-X model.In Section 3, we introduce three Bayesian variable selection methods for the QAR-X model.In Section 4, we give some simulation results that demonstrate the effectiveness of the Gibbs sampling algorithms.In Section 5, we use the proposed methods to fit a real data set.Some concluding remarks are provided in Section 6.Details of the full conditional distribution derivation are postponed in the Appendix.

Likelihood for the QAR-X model
Throughout the paper, we denote Z t = (Y T t , X T t ) T , where Y t = (y t−1 , y t−2 , . . ., y t−p ) T stands for the vector of lags of y t .Moreover, denote ϕ = (α T , β T ) T , where α = (α 1 , α 2 , . . ., α p ) T and β = (β 1 , β 2 , . . ., β q ) T are the autoregressive and regression coefficients of model (1).Then the QAR-X model can be rewritten as where {ε t } is a sequence of independent and identically distributed (i.i.d.) random variables, which is also independent of y t−s for all s ≥ 1.Moreover, its distribution is restricted to have the τ th quantile equal to zero, i.e.
Let {y 1 , . . ., y n } be a time series generated from model (2) with a known starting value Y 1 .
According to Koenker [8], the QR estimator of parameter ϕ is defined as where ρ τ (u) = u(τ − I(u < 0)), τ ∈ (0, 1) and I(•) is the indicator function.The criterion function ρ τ (•) is called the 'check function' in Koenker and Bassett [9].Based on Yu and Moyeed [33], we can assume ε t to follow an asymmetric Laplace distribution (ALD) with density in model (2).Thus, the result of ( 3) is equivalent to maximizing the following 'working' likelihood ( 5) where y = (y 1 , . . ., y n ), Z = (Z T 1 , . . ., Z T n ).According to Kozumi and Kobayashi [13], the ALD distribution can be expressed as a scale mixture of a normal distribution and an exponential distribution.Specifically, ε t can be equivalently expressed as where ) and t ∼ N (0, 1).The notation E(ψ) denotes an exponential distribution with mean ψ.Based on (6), model ( 2) can be equivalently rewritten as where v t | σ ∼ E(τ (1 − τ )/σ ) and t ∼ N (0, 1) are independent.Thus, the distribution of y t conditional on Z t and v t fixed is given by T as n latent variables, the conditional joint density of (y 1 , . . ., y n ) can be written as Since v is a n-dimensional vector, the number of the total parameters is always greater than the sample size n.

Bayesian QAR-X with LASSO penalty
LASSO quantile regression [17] is a regularization technique for simultaneous estimation and variable selection.The LASSO quantile regression estimate for model (2) is defined as where λ is a nonnegative regularization parameter, and • 1 denotes L 1 -norm.The second term of ( 9) is the so-called l 1 penalty which is crucial for the success of the LASSO method.
As pointed out by Park and Casella [22] that the LASSO estimator given in (9) can be explained as a Bayesian posterior mode estimate when specifying a Laplace prior to each component ϕ j , j = 1, . . ., k (k = p + q).Thereby, we develop Bayesian LASSO estimation for the QAR-X model by using a Laplace prior distribution for ϕ of the form In addition, according to Alhamzawi [1], the Laplace prior distribution can be represented as a scale mixture of a normal with an exponential mixing density We further assume the priors where a and b are the hyperparameters.Thereby, we can get the joint posterior distribution of all parameters where s = (s 1 , . . ., s k ) T , π(ϕ, s | λ 2 , σ ) denotes the joint prior density of ϕ and s, π(v | σ ) denotes the joint density of v 1 , . . ., v n conditional on some fixed σ .Potentially, we may integrate (13) about s and λ to get the marginal posterior of ϕ and get an estimate of ϕ by maximizing this marginal posterior.However, it is computationally very expensive to maximize the posterior distribution.We instead use a tractable and efficient Gibbs sampler for model (13).To implement the Gibbs sampling algorithm, we need to find the so-called full conditional distributions for all parameters.By processing the joint posterior distribution, we can easily get the full conditional distribution of α 0 as follows: where Secondly, we obtain the full conditional distribution of ϕ as follows: where φk = where IGaussian( δt , γt ) denotes the inverse Guassian distribution with parameters δt and γt , δt = |y t − α 0 − Z T t ϕ| −1 , γt = (2σ ) −1 .Fourthly, the full conditional distribution of s −1 j is given by where δj = σ λ 2 /ϕ 2 j , γj = λ 2 .Fifthly, the full conditional distribution of σ is given by where Finally, the full conditional distribution of λ 2 is given by where â2 = a + k, b2 = k j=1 s j /2 + b.For details, see the Appendix.

Bayesian QAR-X with adaptive LASSO penalty
Zou [37] extended the LASSO method by allowing different regularization parameters with different regression coefficients.Motivated by this, Alhamzawi et al. [2] proposed the Bayesian adaptive LASSO quantile regression, by putting different penalization parameters on different regression coefficients.Following Alhamzawi et al. [2], we define where λ j is a nonnegative regularization parameter for ϕ j .Similarly, based on Alhamzawi [1], we refer to (10) and (11) to give the Laplace prior to ϕ j in the following form We choose the same prior for α 0 and σ as in (12).For the penalty parameter λ 2 j , we assume the prior where a and b are the hyperparameters.Thereby, we can get the posterior distribution of all parameters where λ 2 = (λ 2 1 , λ 2 2 , . . ., λ 2 k ) T .Now, we drive the full conditional distributions of α 0 , ϕ, v, s, λ 2 and σ .The full conditional distribution of α 0 is the same as Equation (14).Therefore, we directly find the full conditional distribution of the other parameters.Firstly, the full conditional distribution of ϕ is given by where Thirdly, the full conditional distribution of latent variables s −1 j is given by where δj = σ λ 2 j /ϕ 2 j , γj = λ 2 j .Fourthly, the full conditional distribution of σ is given by where Finally, the full conditional distribution of λ 2 j is given by where ãj = a + 1, bj = s j /2 + b.For details, see the Appendix.

Bayesian QAR-X with a spike-and-slab prior
For Bayesian variable selection, the use of the spike-and-slab prior [29] in the QAR-X model is interpretable.Different from the Bayesian LASSO and Bayesian adaptive LASSO methods, the choice of the spike-and-slab prior gives the posterior probability of ϕ j = 0 as zero, and models can be directly compared based on their posterior probabilities.Based on this, we put the following spike-and-slab prior π(ϕ j | δ j , ω 2 ) on parameters ϕ j s where δ j and ω 2 are parameters in the prior.Denote η = ω −2 , the hyper prior π(δ j ) and π(η) = π(ω −2 ) for δ j and η are chosen as In addition, we choose the same prior for α 0 and σ as in (12).Thus, we get the posterior where δ = (δ 1 , δ 2 , . . ., δ k ) T , π(δ j ) := I (0,1) (δ j ) is the prior of δ j which is taking value one when δ j ∈ (0, 1) and zero otherwise, π(η; a, b) denotes the density of (a, b) evaluated at η. Now, we drive the full conditional distributions of α 0 , ϕ, v, δ, η and σ .Similarly, the full conditional distribution of α 0 is the same as Equation (14).Therefore, the full conditional distribution of the remaining parameters can be directly obtained.Firstly, the full conditional distribution of v −1 t is given by where Secondly, the full conditional distribution of σ is given by where and V is the same as in (23).Thirdly, the full conditional distribution of η = ω −2 is given by where ā2 = a + h/2, b2 = b + j∈H ϕ 2 j /2, and H = {j : ϕ j = 0}, h = #H.Denote δ −j and ϕ −j the sub-vector of δ and ϕ by removing their jth element (j = 1, 2, . . ., k), respectively.We can obtain the full conditional distribution of δ j as where ā3 = 1 + I(ϕ j = 0), b3 = 1 + I(ϕ j = 0).Now we focus on the last full conditional distribution of ϕ j .Note that where Then, we would have where P j μ j P j +η , and B = 1 P j +η .Therefore, the full conditional distribution of ϕ j is a mixture of the point mass at 0 and a normal distribution.For details, see the Appendix.Remark 3.1: In Subsections 3.1-3.3,we develop three Bayesian variable selection methods.Therefore, a flow chart could be helpful for the readers to execute the proposed methods in practice.Thus, we draw the flow chart in Figure 1 to show how the above methods are applied for a real data set.As is seen in Figure 1, it includes data preprocessing, order shrinkage and variable selection, model selection, convergence judgment and other processes, which brings convenience for real data analysis.
Regarding the choice of hyperparameters, we refer to the settings of Alhamzawi et al. [2] and Xi et al. [29] in the simulation studies and real data analysis.In Bayesian LASSO method and Bayesian adaptive LASSO method, the hyperparameters a and b are both set to be 0.1 according to Alhamzawi et al.'s [2] advice.The hyperparameters of the prior for η in the Subsection 3.3 are chosen as a = 0.1, b = 0.0005, which gives roughly equal probabilities for η < 1 and η > 1, as pointed out by xi et al. [29].All the simulations are performed under the R software.For each τ ∈ {0.1, 0.5, 0.9}, we run 500 simulations.We collect the Gibbs simulation for M + N = 20,000 iterations.The first M = 15,000 draws are discarded as a burn-in period, and the next N = 5000 iterations from a posterior sample which are taken to be an approximate sample from the full conditional distributions of all the unknown parameters.
In the following, we use two criterions to evaluate the performance of each method.The first criterion is the mean absolute deviation (MAD), which is defined as The second criterion is the average number of zero coefficients correctly selected, labeled as 'TP' and non-zero coefficients erroneously set to 0, labeled as 'FP'.
In this section, we investigate the accuracy of three Gibbs sampling methods under Scenario I.The sample path and autocorrelation function (ACF) plots of Scenario I with n = 500, τ = 0.1 and ε t ∼ SN (0.5, 1, 2) are given in Figure 2. As is seen in Figure 2 that the observation is a stationary sequence, the ACF decays as expected.The initial value of the parameter β is chosen as β (0) = (1, 1) T .The simulation results are summarized in Table 1.
Table 1 shows the MAD and TP/FP results of each method with different errors.As is seen in Table 1, the TP of the SS-QAR method is the highest under all sample sizes, showing a satisfactory performance across all comparative methods, especially in the aspects of the correct recognition rate of zero-valued parameters.For estimating non-zero parameter values, these SS-QAR, BALQR and BLQR methods outperform the other methods.Whether estimating non-zero parameter values in extreme quantiles or other, the MADs of Bayesian methods are smaller than non Bayesian methods.To sum up, the SS-QR method outperforms other methods.
To demonstrate the performance of the Gibbs sampling method of SS-QAR method, we draw the plots of traces, the ergodic means and the ACFs for the posterior samples in Figures 3-5 under Scenario I with ε t ∼ SN (0.5, 1, 2), τ = 0.1 and n = 500.
From Figure 3, we can see that the posterior samples of α and β are both stationary sequences.Figure 4 further illustrates the stationarity of the posterior samples after 5000  iterations.The ACFs given in Figure 5 exhibit an exponential decay tendency, indicating that the Gibbs sampling algorithm converges fast.Figures 3-5 indicate that the estimation results are reliable.
In addition, we also plotted the histograms of the posterior sample in Figure 6.We can see from Figure 6 that (i) for the non-zero parameter, the histogram distributed as a normal distribution, and (ii) for the zero-valued parameter, it intuitively shows us a mixture of a normal distribution with a very small probability and a column stacking at zero with a very large probability.This also reflects the superiority of the SS-QAR method in variable selection and order shrinkage.Next, we study the accuracy of those three Gibbs sampling algorithms under Scenario II.In this scenario, the initial values of β are chosen as β (0) = (1, 1, 1, 1, 1, 1) T .The simulation results are summarized in Table 2.As is seen in Table 2, the SS-QAR method has the highest 'TP' when the sample size n is large.Although such advantages are not obvious when the sample size is small, the advantages of the SS-QAR method are highlighted with the increase of sample size.It is not difficult to see that with the increases of sample size  n, the SS-QAR method has a very desirable property among all the comparison methods, especially in terms of correctly recognizing the zero-valued parameters.For the estimation of non-zero parameters, these SS-QAR, BALQR and BLQR methods are superior to other methods in all cases.From Tables 1 and 2, we can draw the following conclusions: compared with the classical variable selection methods, the SS-QAR method can not only accurately determine the zero-valued autoregressive parameters in the model, but also correctly identify the zero-valued regression parameters.
To show the convergence of MCMC algorithm under Scenario II, we also plotted the traces, the ergodic mean plots, the ACFs and histograms under the SS-QAR method with ε t ∼ 0.98ALD + 0.02N , τ = 0.9 and n = 500 in Figures B1-B4 in Appendix B. These figures show that the MCMC algorithm performs well.Moreover, the estimation performance of v t under Scenario II is presented in Figure 8, which again shows a good performance for the MCMC algorithm.

Real data analysis
In this section, we analyze a real data set about Bike Sharing dataset to demonstrate the performance of the proposed methods.We obtain the Bike Sharing dataset from the web site https://archive.ics.uci.edu/ml/datasets.The analyzed period starts January 1, 2011, and ends July 19, 2011, yielding 200 observations.To make the analyzed series stationary, we consider the follwoing log-returns of CNT, i.e.
where CNT t denotes the CNT on day t.The original series and the log-returns y t are shown in Figure 9.As is seen in Figure 9 that the log-returns y t is a stationary series.Figure 10 shows the plots of ACF and partial autocorrelation function (PACF) for y t .According to Figure 10, we can see that the ACF and PACF plots both show sparse features.Therefore, it is reasonable to fit this data set with a sparse AR(p) model.
We consider the following five explanatory variables: workingday (WD), the normalized temperature in Celsius (Temp), the normalized feeling temperature in celsius (Atemp), the normalized humidity (Hum), the normalized wind speed (Win).The WD is a dummy variable who takes value one if the day is neither a weekend nor a holiday and zero otherwise.The data of these explanatory variables are obtained from the Bike Sharing data set.We denote WD, Temp, Atemp, Hum and Win by x 1,t , x 2,t , x 3,t , x 4,t and x 5,t , respectively.
We use the three Bayesian variable selection methods discussed in Section 3 to analyze this data set.To apply these three Bayesian variable selection methods, we determine an initial order of autoregressive of 15.In order to investigate the main influence factors of CNT changing when the number of people who use bike sharing from small to large, we selected five quantile levels as τ = 0.1, 0.3, 0.5, 0.7, and 0.9.For a fixed quantile level, variable selection and order shrinkage are carried out simultaneously via the proposed three Bayesian methods.For different quantile levels, DIC criterion [23] is adopted to select the best fitting model.Because it is widely used in the problems of Bayesian model selection [15,32].All MCMC iterations are performed 20,000 times, with the first 15,000 iterations as a burn-in sample.In addition, for comparison purpose, the fitting results based on traditional mean reversion model are also presented.All the fitting results are summarized in Table 3.We first compare the DIC values of the QAR-X model and the mean regression model.It can be seen intuitively in Table 3 that the DIC values under the mean regression model are larger than the QAR-X models, indicating that mean reversion is not suitable for analyzing this data set.This result also confirms the fact that quantile regression can accurately characterize the characteristics of different quantiles of data.Next, we go on to compare the DIC values under different quantiles of the QAR-X model.It can be seen that the QAR-X model based on the SS-QAR method with τ = 0.1 has the smallest DIC value, implying that the corresponding model is most suitable for the Bike Sharing data set.According to the fitting result of the SS-QAR method, a final order of p = 14 is selected.It picks up 10 significant lags: 0, 1, 2, 4, 5, 6, 7, 12, 13 and 14, as well as three important explanatory variables: x 2,t (Temp), x 4,t (Hum) and x 5,t (Win).It is seen clearly that humidity (x 4,t ) and wind speed (x 5,t ) are always significant variables for all fitted models.This is basically consistent with what we know.However, it is very interesting to notice that the SS-QAR method also picks up the temperature (x 2,t ) as a significant variable when τ = 0.1.Notice that τ = 0.1 means that the number of people using bike sharing is very small.This helps us capture an important conclusion, that is, the temperature is a crucial factor leading to the small number of bike sharing users.In other words, the temperature factor directly influences people's willingness to use bike sharing in cold winter.This finding is highly consistent with our daily life.When temperature is low, the number of people who prefer bike sharing will present a sharp decline.Take the Changchun city in China as an example, almost no people use bike sharing in winter.Therefore, we can advise the investors or operators to put in fewer or no bike sharing in cold winter to save operating costs.Unfortunately, such an important result was not found by the mean reversion models.Now, let us focus on the fitting results of τ = 0.5 and 0.9, which means that a large number of people use shared bicycles.In these situations, the humidity (x 4,t ) and wind speed (x 5,t ) paly a major role.Notice that all the fitted regression coefficients of these two covariates are negative, which means the two covariates have different degrees of negative impact on the use of shared bicycles.Based on the above analysis, we conclude that for the Bike Sharing data set, the fitting effect of QAR-X model with SS-QAR method is a competitive one, which outperforms the classical mean regression models.
Figures 11 and 12 show the traces and ergodic mean plots for the SS-QAR method when τ = 0.1, which guarantees the stationarity of the posterior samples after 15,000 iterations.Figure 13 reports the fitting results of v t (t = 1, 2, . . ., 200) under the Bike Sharing data set with τ = 0.1.We can see that the histogram is consistent with the curve of the exponential distribution with mean τ (1 − τ )/ σ = 0.6334, showing a reasonable estimation result for the latent variables v t .This again illustrates the superior fitting effect of the SS-QAR method.

Conclusions
In this paper, we consider three Bayesian variable selection methods for the QAR-X model.In general, the Bayesian methods are performed better than the classical methods.Our findings show that the SS-QAR method performs better than other methods in both identifying the zero-valued parameters and estimating non-zero parameters for the model.Also, it can accurately capture specific features of data at different quantiles, and accurately select significant explanatory variables for a specific quantile.Our simulation results support the validity of those Bayesian variable selection methods.Finally, a real data analysis shows that a sparse AR(14) model with three explanatory variables based on the SS-QAR method is appropriate for the Bike Sharing data set.Potential issues of future research include to develop the proposed method for high-dimensional time series models, or hysteretic autoregression models.These remain topics of future study.20210101149JC], Scientific Research Project of Jilin Provincial Department of Education [grant numbers JJKH20220671KJ, JJKH20230665KJ].

Figure 1 .
Figure 1.The flow chart shows the implementation of the proposed three methods.

Figure 4 .
Figure 4. Ergodic mean plots of α 0 , α and β of SS-QAR method under Scenario I. (a) is the ergodic mean plot of α 0 .(b)-(h) are the ergodic mean plots of α. (i)-(j) are the ergodic mean plots of β.

Figure 7
Figure7reports the posterior means of v t (t = 1, 2, . . ., n) under Scenario I, with τ = 0.1, and the sample size n = 500.As is seen in Figure7, the histogram fits well with the density curve (the red solid line) of E(0.7509), showing a good performance of the Gibbs sampling algorithm of the SS-QAR method.Next, we study the accuracy of those three Gibbs sampling algorithms under Scenario II.In this scenario, the initial values of β are chosen as β (0) = (1, 1, 1, 1, 1, 1) T .The simulation results are summarized in Table2.As is seen in Table2, the SS-QAR method has the highest 'TP' when the sample size n is large.Although such advantages are not obvious when the sample size is small, the advantages of the SS-QAR method are highlighted with the increase of sample size.It is not difficult to see that with the increases of sample size

Figure 7 .
Figure 7. Histogram of estimated v t (n = 500) versus the density of E(0.7509) under Scenario I.

Figure 8 .
Figure 8. Histogram of estimated v t (n = 500) versus the density of E(0.4298) under Scenario II.

Figure 9 .
Figure 9.Time series plots of CNT t and its log-return y .

Figure 10 .
Figure 10.ACF and PACF plots of the log-return y t .

Figure 13 .
Figure 13.Histogram of fitted v t under the Bike Sharing dataset versus the density of E(0.6334).

Table 1 .
Simulation results under Scenario I.

Table 2 .
Simulation results under Scenario II.