Modeling Functional Time Series and Mixed-Type Predictors With Partially Functional Autoregressions

Abstract In many business and economics studies, researchers have sought to measure the dynamic dependence of curves with high-dimensional mixed-type predictors. We propose a partially functional autoregressive model (pFAR) where the serial dependence of curves is controlled by coefficient operators that are defined on a two-dimensional surface, and the individual and group effects of mixed-type predictors are estimated with a two-layer regularization. We develop an efficient estimation with the proven asymptotic properties of consistency and sparsity. We show how to choose the sieve and tuning parameters in regularization based on a forward-looking criterion. In addition to the asymptotic properties, numerical validation suggests that the dependence structure is accurately detected. The implementation of the pFAR within a real-world analysis of dependence in German daily natural gas flow curves, with seven lagged curves and 85 scalar predictors, produces superior forecast accuracy and an insightful understanding of the dynamics of natural gas supply and demand for the municipal, industry, and border nodes, respectively.


Introduction
In many business and economics studies, researchers seek to measure the dynamic dependence of functional data, which are represented as curves.Examples include the measurement of yield curves and market volatility (Chen and Niu 2014;Xu, Li, and Chen 2017;Grith et al. 2018), financial contagions and systemic risks (Zhu et al. 2019), electricity price curves and energy supply and demand (Liebl 2013;Chen, Marron, and Zhang 2019), temperature and precipitation curves (Ramosay and Dalzell 1991;Ramsay and Silverman 2005;James, Wang, and Zhu 2009), and medical and genetic data (Yao, Müller, and Wang 2005a;Tang and Müller 2008;Li, Huang, and Härdle 2019).The curves are seldom collected in isolation, but are associated with many exogenous variables in different forms, including scalars and curves.We consider the problem of modeling such dependence in settings where the dimensionality of the choice predictors is large and they are mixed-type with both scalars and curves.In such settings, the standard functional regression approaches, if considering partial information (i.e., only curve predictors), will lead to modeling bias, and if considering complete information with large-dimensional mixedtype predictors, will meet estimation bias.We propose methods based on recent advances in functional time series analysis and statistical learning that address these challenges in a way that is theoretically consistent and computationally tractable with large-scale data.
To model dependence in curves, functional regressions have been well developed (Ramsay and Silverman 2005;Wang, Chiou, and Müller 2016), with the response being either a scalar or functional variable and the predictors usually being curves.For example, these include functional linear models (Ramosay and Dalzell 1991;Yao, Müller, and Wang 2005b;Matsui and Konishi 2011;Kong, Staicu, and Maity 2016), generalized functional linear models (Müller and Stadtmüller 2005;Gertheiss, Maity, and Staicu 2013), functional additive models (Müller and Yao 2008;Müller, Wu, and Yao 2013;Fan, James, and Radchenko 2015), historical functional linear models (Malfait and Ramsay 2003;Harezlak et al. 2007;Brockhaus et al. 2017), and functional factor models (Härdle and Majer 2016;Zhang et al. 2017).For mixed-type predictors, the functional partially quantile regression model (Lu, Du, and Sun 2014), the generalized functional linear model with response in the exponential family (Crainiceanu, Staicu, and Di 2009;Goldsmith et al. 2011), and the partially functional linear model (Kong et al. 2016) have been proposed.However, the response is assumed to be independent in these works, which ignores the serial dependence that often exists in business and economics data.
Functional autoregressive (FAR) models incorporate the serial dependence among curves over time (Bosq 1991;Besse, Cardot, and Stephenson 2000;Bosq 2000;Mourid and Bensmain 2006;Didericksen and Zhang 2012;Liu, Xiao, and Chen 2016;Berhoune and Bensmain 2018), where the lagged curves are used as predictors to measure the lead-lag dependence among curves; see also FARX with exogenous covariates (Damon and Guillas 2002;Chen, Chua, and Koch 2018), the vector FAR for multiple series of curves (Chen, Chua, and Härdle 2019;Guo and Qiao 2020), nonstationary models such as the adaptive FAR, the varying coefficient FAR, and the vector error correction FAR (Liang and Schienle 2019).In these models, the predictors are usually one or a small number of lagged curves.Mixed-type predictors and high-dimensional settings are neglected.With partial information, the models fail to describe the fundamental dynamics, which also results in poor interpretation.
Our main question of interest is to what extent we can benefit from using both the serial dependence and the large-dimensional mixed-type predictors in measuring the dependence of curves.One can naturally consider incorporating lagged curves as predictors in partial functional regression as one solution.However, the estimation accuracy in the largedimensional setting suggests that we might encounter a finite sample bias.First, the parameter space is infinite because curves are defined in a continuous domain, and serial dependence presents in every part of the curves.Second, the number of possible predictors, both scalar and curves, is large compared to the number of actual predictors observed.
We propose a partially functional autoregressive model (pFAR) to measure the dependence of curves with highdimensional mixed-type predictors.This involves two types of dependence.The serial dependence on multiple lagged curves is controlled by coefficient operators that are defined on a two-dimensional surface (τ , s) with τ , s ∈ as the continuous domain of the curves.Compared to the standard convolutional kernel operators on τ − s, the spatial/location difference in the curves only, the surface allows us to distinguish the individual effects of various parts of the curves.In addition, the impacts of many exogenous scalar predictors are described by coefficient functions defined in a continuous domain.
To compute an accurate estimate of dependence, we must grapple with two methodological challenges.The first is the finite sample bias mentioned above, which increases exponentially with the number of predictors.We show that naive estimators are infeasible for large-scale settings.Second, although our model takes a convenient linear form, the considerable number of possible predictors and parameters makes the standard estimation approaches computationally infeasible and interpretation difficult.Two approaches were used to address these challenges.The first is the sieve estimator (Grenander 1981;Chen 2007), which projects an infinite-dimensional curve onto a finite-dimensional parameter space with controlled information loss.The second is a two-layer sparse group LASSO penalty (Simon et al. 2013) used to identify the key model parameters and groups, which are known to be effective in controlling bias.The asymptotic property shows that our estimator allows for oracle-dependence structure detection and an efficient estimation under sparsity.Numerical validation also suggests that the dependence structure can be accurately detected.
We build on the methods developed in previous literature on partial functional regression and functional autoregression.Many aspects of the current study, including our proposed two-layer sparse estimator, our approaches for choosing the threshold in the sieve estimator, and the predictor specification of our model, are novel with respect to prior work.Most importantly, Kong et al. (2016) developed a partial functional linear model but did not attempt to impose a group structure in the mixed-type predictors, and the response variable was scalar and IID.Damon and Guillas (2002) proposed functional autoregressive models with multiple functional predictors, but the number of predictors was small and the serial dependence was restricted to the spatial difference τ − s only.Basu and Michailidis (2015) and Barber et al. (2017) studied on sparse high-dimensional and function-on-scalar regressions respectively, where they developed sparse estimation with the LASSO penalty in the highdimensional/functional regressions framework, but paid little attention to the choice of sieve-type parameter when the functional response was projected to a discrete high-dimensional finite space.Our study also relates to other works on measuring dependence in functional time series, including Liu, Xiao, and Chen (2016), Guo and Qiao (2020), and high time resolution energy forecasting, including Cho et al. (2013) and Chen, Xu, and Koch (2020), among others.
This study contributes to the use of sieve estimation and regularization in functional time series analysis with highdimensional mixed-type predictors.The proposed pFAR model characterizes rich information, including both serial dependence and effects of exogenous variables, allowing economic interpretation in a unified modeling framework.The model setting provides a flexible temporal dependence structure with the coefficient operator being a surface, and elaborates on the individual and group effects of mixed-type predictors with two-layer regularization.In this study, we develop an efficient estimation with proven asymptotic properties of consistency and sparsity.We show how to choose the sieve parameter and tuning parameters in regularization based on a forward-looking criterion.We show the implementation of the pFAR with a real-world analysis of measuring dependence in the German natural gas transmission network, with seven lagged curves and 85 scalar predictors.We find that the pFAR model detects economically meaningful driving factors for the natural gas demand and supply curves and provides an insightful understanding of the dependence among daily natural gas flow curves.In the out-of-sample validation, it produces superior forecast accuracy with up to 42%, 129%, and 28% improvements for the municipal, industry, and border nodes, respectively.
The remainder of this article is organized as follows.Section 2 presents the pFAR model and the estimation procedure.Section 3 reports on the asymptotic properties of the estimator.Section 4 presents the finite-sample performance using Monte Carlo experiments.Section 5 applies the pFAR model to realworld data.Section 6 concludes this article.

Model
We now present the pFAR model and detail the estimation procedure.In the pFAR modeling framework, the response variable is a serially dependent curve, and the predictors are mixed type with multiple lagged curves and large-dimensional scalar variables.
Notation.For matrix A ∈ R q 1 ×q 2 , we let |A ij | be the absolute value of ijth element, |A| 1 be the L 1 norm, A 2 be the Euclidean norm, and A F = ( i,j A 2 ij ) 1/2 be the Frobenius norm.We use (A) k. and (A) .k to denote the kth row and column of A, respectively.Let L 2 (C) denote a Hilbert space H of square integrable functions defined on a compact set C equipped with the inner product < f , g >= C f (τ )g(τ )dτ for f , g ∈ L 2 (C), which induces the norm . The tensor product is further denoted by A = L 2 (C) ⊗ L 2 (C).For any K ∈ A, it can be considered as the kernel function of a linear operator acting on For notational simplicity, we use K to denote both the kernel function and the operator, and denote the Hilbert-Schmidt norm by

The pFAR Model
Let {Y t (τ )} n t=1 denote a series of n random curves that take values on L 2 (C) over a continuous domain τ ∈ C at time t.For example, {Y t (τ )} represents the series of daily natural gas flow curves on day t and the continuous domain τ ∈ [0h, 24h].Associated with each curve Y t (τ ), there are mixed-type predictors, including p lagged curves denoted as Y t−1 (τ ), . . ., Y t−p (τ ), and d-dimensional scalar predictors x t , = 1, . . ., d.In our settings, p and d are both large, but not every predictor is active when measuring the dependence of the curves.
Without loss of generality, we assume that the functional variables Y t (τ ) and the scalar variables x t are normalized with a mean of zero.The partially functional autoregressive (pFAR) model is defined as follows: where β j (τ , s) ∈ A is a coefficient operator at lag j for j = 1, . . ., p with τ , s ∈ C. The transition function controls the serial dependence of the functional response on its p lagged curves.The coefficient function γ (τ ) is defined in L 2 (C), which measures the impact of the th scalar predictor x t−1, .The innovation ε t (τ ) ∈ L 2 (C) is a white-noise curve with zero mean and finite second moment E ε t (τ ) 2 H < ∞.Given the high-dimensional scenario with mixed-type predictors, we assume that in (2.1), there is sparsity in the serial dependence measured by {β 1 (•, •), . . ., β p (•, •)} and the crossdependence by {γ 1 (•), . . ., γ d (•)}.In other words, the number of possible parameters corresponding to the (p + d) mixedtype predictors are much larger than those of the actual active parameters.Here, if a certain predictor is not active to the response, then the functional parameter of that component is zero.That is, the coefficient functions contain a certain amount zeros.These mixed-type predictors can be further split into groups based on the data features or economic meanings.For example, in the later real data analysis, each lagged curve is considered one group; the price-related variables form one group; the renewable energy-related variables make another group, etc.Although the number of groups is predetermined, the sparsity structure, including which group of predictors are inactive and which particular predictor are inactive in an active group, is unknown in terms of both number and location.
The functional observation Y t (τ ) can be projected onto a sequence of (orthogonal) functional basis functions: where the basis function ) can be eigenbasis (Cho et al. 2013;Kong et al. 2016;Xu et al. 2020), B-splines (Liu, Xiao, and Chen 2016), Fourier (Chen and Li 2017), and Wavelets (Morris et al. 2003;Luo et al. 2016).The number of bases k goes to infinity.Thus, there was no information loss in the expansion.Given the fixed form of the basis functions, the functional variables can be equally explained by the coefficients y t,k .
Estimating an infinite number of coefficients using a finite sample is computationally infeasible.The sieves method (Grenander 1981;Chen 2007) projects the infinite-dimensional process onto a finite parameter space, controlling information loss.Specifically, we construct sieves, a sequence of subspaces { s } of the original infinite-dimensional space , which is compact and nondecreasing with s ⊆ s+1 ⊆ • • • ⊆ , and the union of the subspaces s is dense in .For a given dataset with n observations, the idea is to choose K n -a parameter space with degree of K n , so that the loss function can be well defined in the finite-dimensional linear space: where {θ k } denotes the coefficients of the expansion for functional terms, and c is a positive constant preserving the growth rate of K n .We consider K n a hyperparameter in the sieve approach and discuss its selection in a later section.Under the sieves with K n , the (approximated) projection is performed in a finite parameter space with k = 1, . . ., K n .Analogously, we conduct the projection of the coefficient operators and innovations with the same basis and under the same sieve space We obtain the relationship of coefficients, based on which the pFAR (2.1) can be represented on the sieve space as follows: (2.4) where y t = (y t,1 , . . ., y t,K n ) are the basis coefficients for the curves Y t (τ ), y t−j are for the lagged curves, and x t = (x t,1 , . . ., x t,d ) are the observed scalar predictors.Here, B (j) ∈ R K n ×K n is the unknown parameter matrix, where the ikth element is B (j) i,k .B (j) contains the basis coefficients for coefficient operators β j .
Assume the coefficients of the innovations t,k ∼ IID(0, σ 2 k ) for k = 1, . . ., K n , loss can be measured as follows: where Y ∈ R (n−p)×K n with row vectors y t and Z ∈ R (n−p)×(pK n +d) with rows by Note that Z consists of the basis coefficients of the lagged curves and exogenous scalars.In addition, this includes B = (B (1) , . . ., B (p) , ) ∈ R (pK n +d)×K n .A direct estimation is often inefficient in large-dimensional settings because the number of coefficients increases quickly with the model order p, the number of scalar predictors d, and the degree of sieves K n .Next, we present a regularized estimation with a two-layer penalty-enforcing group and element sparsity in dependence.

Estimation Via Regularization
We introduce the group structure in the estimation of the largedimensional model.We assume the group and element sparsity pattern in the matrix {B (1) , . . ., B (p) } and to recover the functional sparsity structure in {β 1 (τ , s), . . ., β p (τ , s)} and {γ 1 (τ ), . . ., γ d (τ )}.Specifically, we manually split the mixedtype predictors into p + m groups, where each lagged curve Y t−j (τ ) for j = 1, . . ., p is considered one group, and the d scalar variables are separated into m groups according to economic meaning or domain knowledge.Both group and element sparsity are considered.If all elements in a group are zeros, then it means that the group itself has no effect on dependence; therefore, we say there is group sparsity.Moreover, for an active group, that is when some elements are nonzero, the focus is on element sparsity.This means that within the active group, some variables have no impact, and thus, the corresponding coefficients should be zero.If all the groups and elements are nonzero, there is null sparsity, indicating that all mixed-type predictors are influential.
Let G = {1, 2, . . ., p + m} be the group index and G denote the group structure set for the functional and scalar predictors in B; that is, B g ∈ G denotes the coefficient subset of group g ∈ G.The regularized estimation minimizes the penalized least-square function with the sparse group LASSO penalty (Friedman, Hastie, and Tibshirani 2010;Li, Nan, and Zhu 2015): where the L 2 -penalty term shrinks the coefficients of inactive groups to zero, and the L 1 -penalty forces the inactive entries within an active group to zero.Each group is assigned a weight η g to balance the size differences among the groups.The default choice of η g is the square root of the group size, see Huang, Breheny, and Ma (2012) and Simon et al. (2013).There are two tuning parameters λ ≥ 0 for the groups and α ≥ 0 for individual variables.Note that the penalty is reduced to LASSO when λ = 0 and becomes group LASSO if α = 0.
As an illustration, we show the estimation form for the coefficient b ik in B, when the other elements of B are fixed.Here, b ik denotes the ikth element of B with i = 1, . . ., pK n + d and k = 1, . . ., K n .Let B denote the local minimizer of (2.5) and bik be its ikth element.Suppose that b ik belongs to group g ∈ G with b ik ∈ B g .Then, all elements in Bg will be zero if where ) .k and B(−i.) denote the estimated matrix B with the elements in the ith row replaced by zeros, and z .idenotes the ith column of matrix Z. • .denotes the soft thresholding operator defined by x c = sign(x) max{0, |x| − c} for data x and c.Note that the algorithm identifies the sparsity structure and simultaneously estimates active elements and groups.Given the regularized estimates in B(j) and ˆ , the functional estimates of the elements in {β j (τ , s)} and {γ (τ )} can be obtained via where φ(τ ) = (φ 1 (τ ), . . ., φ K n (τ )) is the vector of basis.

Hyperparameters
There are several hyperparameters in the estimation.The tuning parameters α and λ control the sparsity of mixed-type predictors in regularization, and the sieve parameter K n influences the reduced parameter space.
In previous literature, tuning parameters are often selected using cross-validation (see, e.g., Rice and Silverman 1991;Yao, Müller, and Wang 2005b), BIC (Wang, Li, and Tsai 2007), and combined BIC and AIC (Kong et al. 2016).In our study, we use a forward-looking criterion to select tuning parameters that minimizes the out-of-sample forecast accuracy.We divided the whole sample into training, validation, and test sets.The tuning parameters (α * , λ * ) are chosen with the minimum root mean squared forecast error over the validation period T v where Tv is the length of validation set, τ s is the discrete grids of the continuous interval τ ∈ C, and More importantly, the sieve parameter K n not only needs to balance the bias-variance tradeoff in the approximation of the sieves, but also has an impact on the computational cost.
In general, a large value of K n offers good estimation accuracy, but the computation time increases exponentially with the number of unknown coefficients.While the standard approach suggests the choice of K n with cross-validation, we propose an alternative approach, which shows in the later simulation that once the "right" value of K n is reached, there is little space for further improving the estimation and prediction accuracy using even larger values of K n , although this will not hurt the estimation or prediction.The choice of basis function (τ ) has a minor impact on estimation, although it indirectly influences the choice of K n .We will show the numerical illustration soon.

Asymptotic Properties
We demonstrate that an oracle property and oracle inequality property exist for the pFAR estimator.
Let B ∞ 0 denote the true basis coefficient matrices of the parameter functions β j (τ , s) and γ (τ ); B 0,K n denote the projection of B ∞ 0 on the sieve K n , and BK n ∈ R P n ×K n be the pFAR estimator, where P n = pK n +d.We present the asymptotic properties in the (P n × K n )-dimensional subspace.For notational simplicity, we remove the subscript of sieve parameter K n in the notation, that is, replace B 0,K n and BK n with B 0 and B, respectively.Furthermore, b ik represents the (i, k)th element of B for i = 1, . . ., P n , and k = 1, . . ., K n .Let 1 = {j ∈ {1, . . ., p} : β j A = 0} and 2 = { ∈ {1, . . ., d} : γ H = 0} be the sets of active functional and scalar covariates, respectively.Let | as the number of active (nonzero) elements and groups, respectively.Thus, M 1 (B) and M 2 (B) jointly characterize the sparsity, with smaller values indicating a more sparse model.For a matrix ∈ R P n ×K n and the element index (coordinates) subset as the projection of on the index set W 1 , which has the same elements as on the coordinates W 1 and zero elements on the complementary coordinates W c 1 .For a group index set W 2 ⊆ G, we denote W 2 as the combination of the projection of on each group in W 2 , that is, W 2 = { g : g ∈ W 2 }.The asymptotic properties are derived under the four conditions.C1.The sieve dimensionality K n goes to infinity with rate satisfying log(K n )/n → 0. C2.Pr(|z ti |> δ) ≤ 2 exp(−1/2 1 δ 2 ) for any δ ≥ 0 and some 1 > 0. The probability is independent of i for i = 1, . . ., P n .That is, {z t ∈ R P n } which consist of the basis coefficients of the lagged curves and scalar predictors are subGaussian random variables.We have Remark.Condition C1 controls the increasing rate of sieve dimensionality, C2 specifies the moments of covariates after expansion, C3 is for the coefficients of innovations, C4 determines decay rates of the upper bounds for coefficients {B (j) i,k } and { ,k } preventing the coefficients from decreasing too slowly, which are needed only to control the tail behavior for large i, k.Hall and Horowitz (2007) and Kong et al. (2016) stated the similar decay conditions in functional linear models.
Assumption 1.We denote the estimation error then the following conditions hold: Remark.In Assumption 1, the integers π 1 and π 2 play the role of an upper bound on the sparsity M 2 (B 0 ) and M 2 (B 0 ), respectively, of the coefficient matrix B 0 .Assumption 1 imposes restrictions on the errors = B − B 0 , which is called the restricted eigenvalue (RE) assumption (Bickel, Ritov, and Tsybakov 2009).It suffices for the main argument of the LASSO estimator in a multiple linear regression framework.Li, Nan, and Zhu (2015) used a similar assumption for the sparse group LASSO estimator in multivariate regressions.
We show that the prediction loss between the sieve estimator B over the subspace K n and the theoretical estimator is bounded.We derive the asymptotic consistency of the sieve estimator and investigate the theoretical bound of the order of sparsity.
Theorem 3.1.Suppose Conditions C1-C4 and Assumption 1 are satisfied.Denote , where U denotes the maximum diagonal element of the matrix ψ = Z Z n−p , and φ max denotes the largest eigenvalue.We denote ς g = λη g /α for g ∈ G.With a probability of at least 1−(P n K n ) 1−r 2 /2 , we have the following oracle bounds: and the order of sparsity (i.e., the number of selected elements) satisfies the bound: Remark: Theorem 3.1 builds the asymptotic properties for the estimates under the sieve if the true model is given.The mean squared prediction error of the basis coefficients of the response curves Y t (τ ) are bounded by a factor of order log(P n K n )/(n − p).Furthermore, the L 1 -norm of the estimation error of the parameter functions β j (τ , s) and γ (τ ) is bounded by a factor of the order log(P n K n )/(n − p).Moreover, the order of sparsity is bounded by a constant related to Assumption 1.Thus, Theorem 3.1 presents the oracle inequality properties of the sieve estimator B expressed in terms of the basis coefficients.
Recall that the true parameter under Let βj (τ , s) and γ (τ ) be the pFAR estimates obtained via (2.6).We establish the asymptotic properties of the pFAR estimator over space L 2 (C).
Theorem 3.2.Suppose Conditions C1-C4 and Assumption 1 are satisfied.For n, K n → ∞, we have Remark.Theorem 3.2 establishes the oracle inequality properties over L 2 (C), showing that the mean square prediction loss of the pFAR estimator and the L 1 -norm of the estimation error for the parameter functions β j (τ , s) and γ (τ ) converge to zero in probability.Supplementary materials present the proofs of the above theorems.

Simulation Studies
In this section, we study the finite-sample performance of the proposed estimation in the pFAR modeling framework.We consider settings with moderate-and high-dimensional mixedtype predictors under different dependence structures.Our interest is three-fold: the estimation accuracy of coefficients, the predictive accuracy of the proposed model measured in-sample and out-of-sample, and the sparsity detection evaluated using false zeros (FZ) and false nonzeros (FN) in both elements and groups.In addition, we study the effect of the functional basis choice and the choice of sieve dimensionality K n .

Set-up
We consider the three experiments described below.In each experiment, data were generated using the pFAR process: where the serially dependent functional variables are generated by Y t (τ ) = φ(τ ) y t for τ ∈ [0, 1].The term φ(τ ) is a K 0dimensional functional basis, and y t ∈ R K 0 are coefficients that follow the coefficient relationship derived from the pFAR process.We generate innovations ε t (τ ) = φ(τ ) E t , where ) are separated into m groups and are generated from a multivariate normal distribution N d (0, ).The covariance matrix = diag( g 1 , . . ., g m ) is block diagonal with each block corresponding to a group of x t , and g i (j, k) = 0.5 |j−k| for any (j, k)-pair in each group i.
To mimic a real setting, we generated data with measurement errors.Specifically, we obtain the observed response values V tj with measurement errors V tj = Y t (τ j ) + u tj from L = 50 equally spaced time points 0 = τ 1 , . . ., τ L = 1, with errors u tj being randomly sampled from N(0, 0.5 2 ).We set n = 800, that is, the sample size, which is split into T 1 = [1, 600] as the training set to perform in-sample estimation, T 2 = [601, 700] as the validation set for choosing the tuning parameters (α * , λ * ) selection in Equation (2.7), and T 3 = [701, 800] as the test set to conduct the out-of-sample prediction.The generation was repeated 100 times in each experiment.
Experiment 1 (General case).This experiment aims to investigate the estimation, forecasting, and sparsity detection under different dimensionalities of p and d with the general form of the pFAR model.We consider two scenarios: a moderate configuration with p = 7 and d = 20, and a high-dimensional configuration with p = 14 and d = 200.The available mixedtype predictors are much larger than the actual predictors, for which we set K 0 = 5 for data generation.Both scenarios feature group sparsity and element sparsity on the functional and scalar predictors simultaneously.Specifically, the functional predictors are active at lags 1 and 7, with for the moderate configuration, and for the high-dimensional configuration, lag 14 is also active with β 14 (τ , s) = −0.2[1+ cos(0.5πτ ) + sin(0.5πτ)], while β j (τ , s) = 0 for all the remaining lags.Note that the surface β j (τ , s) represents the crosssectional dependence in every part of the curves, which are not necessarily symmetric, for which a simplified version, for example, β j (τ − s), can provide partial dependence only.For example, Figure 1 displays the 3-D plots of β 1 (τ , s), β 7 (τ , s), and β 14 (τ , s) for the high-dimensional configuration.For s < 0.5 (or > 0.5), the dependence of β 1 (τ , s) is stronger when τ is close to 0.3, and becomes weaker when τ nears boundaries.For β 7 (τ , s), the dependence is stronger when both τ and s are close to the area (τ , s = 0.5), and is weaker in the border region.For β 14 (τ , s), the dependence is negative and independent of domain s with a fixed τ ; the dependence is stronger when τ is close to 0.5, and weaker if τ nears the boundaries.Given the generated data, we treat the degree of sieves K n as a hyperparameter and consider K n ∈ {1, 3, . . ., 11, 15, 21}.In this experiment, we adopted the Fourier basis for estimation.
Experiment 2 (Special case).This experiment has settings similar to those in Experiment 1, except that the serial dependence is controlled by , which is a special case of β j (τ , s).Here, φ k (•) is the Fourier basis, and K 0 = 5.Such a serial operator enables the dependence of Y t (τ ) on Y t−j (s), that is, two parts of curves that are stronger for τ being close to s, than those far apart.We follow Liu, Xiao, and Chen (2016) to generate the pFAR process under this dependence structure.Again, Lags 1 and 7 are active in a moderate configuration, and Lags 1, 7, and 14 are active in the highdimensional configuration.The active coefficients B (j) k are generated from a uniform distribution with B (1) 2 ], and where k = 1, . . ., 5. Again, we used the pFAR estimation to fit the special case data.We still adopted the Fourier basis and reported the model performance with K n ∈ {1, 3, . . ., 11, 15, 21}.
Experiment 3 (Basis functions).This experiment aims to study the effect of the functional basis in settings similar to Experiment 1.We consider a moderate configuration with p = 7 and d = 20 only for computational efficiency.Again, the lag 1 and lag 7 curves and the first 10 scalar predictors are active, while the other mixed-type predictors are inactive.We also have m = 5, with an equal group size of 4. We use two types of functional basis φ(τ ) in data generation, namely the Fourier basis and the B-spline basis with K 0 = 7, and then use the Fourier, B-spline, and wavelet basis functions, respectively, in the estimation.For the wavelet basis, we apply the discrete wavelet transform (DWT) technique (Misiti et al. 2004) which converts the discrete observations of the response curve to an L/2-dimensional wavelet coefficient vector, without specifying the value of K n .Specifically, three prevalent orthogonal wavelets-Daubechies (db4), Symlet (sym4), and Coiflet (coif2)-are adopted.

Evaluation
We consider 3 types of metrics to evaluate performance.The estimation accuracy of the coefficients is measured by the mean squared error (MSE) for all the estimates { βj (τ , s), j = 1, . . ., p} and { γ (τ ), = 1, . . ., d} as follows: A ), and where βj (τ , s) and γ (τ ) are estimated using (2.6).We compute the prediction accuracy of the functional response, including the in-sample MSE (inMSE) over T 1 and the one-step-ahead out-of-sample mean squared forecast error (outMSFE) over T 3 , by: where T1 and T3 refer to the interval length of the training and test set, respectively.Note that the evaluation is performed over discrete grids of the continuous curve, denoted as Y t (τ i ), taken at an equally spaced sequence of times {τ i ∈ [0, 1], i = 1, . . ., L} with L = 50.The prediction Ŷt,K n depends on the tuning parameters α * and λ * , as well as the value of the sieve parameter K n .
To assess sparsity detection, we use the rates of FZ and FN.An FZ occurs when the analysis fails to detect an active predictor.In other words, the active variable is incorrectly estimated to be zero.In contrast, a false nonzero occurs when the procedure gives spurious importance to some irrelevant predictor, wrongly identifying it as nonzero.We separately measure the sparsity detection for the functional predictors and scalar predictors, and denote the rates with subscripts • f and • s .Moreover, we compute for the group structure and label the rates as gFZ and gFN , where I(•) is an indicator function.All rates take values between 0 and 1. Lower rates indicate better detection of the active elements/groups and sparsity structure.

Results
Tables 1 and 2 present the numerical performance -sparsity detection, coefficients estimation accuracy, and model prediction accuracy-of the two experiments, Exp•_M and Exp•_H, where M stands for the moderate configuration and H for the high-dimensional one.We report the performance with respect to the choice of K n ∈ {1, 3, . . ., 11, 15, 21}.For each fixed value of K n , the tuning parameters (α * , λ * ) are selected via Equation (2.7).It shows that there is successful identification of all the key mixed-type predictors and groups, independent of the choice of K n (> 1 for Experiment 1 ).The false rates are quite close to zero for both the functional predictors and the groups.For example, all the FZ f , FZ s and gFZ are always zero when K n > 1. FN f is remarkably close or equal to zero when K n > 1 for Experiment 1, and is always equal to zero for Experiment 2.Even for the high-dimensional case where there are 14 lagged functional covariates, the largest error rate FN f is only 0.01 for Experiment 1 and zero for Experiment 2, referring to an accurate detection of functional covariates.Analogous to gFN, which is equal to zero for different K n (> 1).FN s is also remarkably close to or equal to zero for different K n with the largest error rate of only 0.02 in Experiment 2. However, the model tends to slightly over-select individual scalar sparsity in Exp1_M with FN s being small and fluctuating around 2%-10%.This improves when looking at the high-dimensional configuration that holds many inactive scalar predictors with FN s dropping to 0%-2% when K n > 1.
The estimation reached reasonable accuracy in all settings.The small values of MSE β and MSE γ indicate an accurate estimation of both the serial dependence measured by β j (τ , s) and the impact of exogenous covariates measured by γ (τ ) once the hyperparameter K n reaches the true value of 5. Note that in Experiment 2, the two-dimensional functional parameters     The relation between the estimation accuracy and the value of K n presents a direct method for parameter choice.In both experiments, there was a sharp decrease in estimation errors, including MSE β , MSE γ , inMSE, and outMSFE, as well as the corresponding standard errors of the accuracy measures, when the true value K 0 = 5 is reached.Accuracy remains stable afterwards with even larger values of K n .Figure 3 displays the parameter estimation accuracy and prediction accuracy versus  NOTE: There are two scenarios with the data generated using the Fourier and B-spline basis, denoted by Exp3_Fourier and Exp3_Bspline, respectively.The average value of the sparsity detection rate, in-sample and out-of-sample accuracies, and standard errors (in parentheses) using three functional bases − Fourier, B-spline, and Wavelet − for regularization are reported for different choices of K n or wavelets.
various K n values for Experiments 1 and 2. Clearly, all the error criteria exhibit a sharp decrease when K n arrives at the true value of 5 and remain relatively stable afterwards in both experiments.This suggests that the forward-looking criterion allows a proper choice of tuning parameters, given a sufficiently large value of K n .Compared to alternative approaches (e.g., cross-validation), a comparison of a small set of K n can lead to an efficient choice.Moreover, a cautious choice of a large value for K n does not significantly worsen the overall performance.Table 3 presents the results of the impact of basis functions for Experiment 3, labeled as Exp3_Fourier and Exp3_Bspline, which refer to the true basis used in the generation processes.
It reports the average value of the sparsity detection, in-sample and out-of-sample prediction accuracies with corresponding standard errors.Comparing the estimation results based on different functional basis-Fourier, B-spline, and Wavelet-with different choices of K n or wavelets, there are minor differences in sparsity detection and prediction accuracy at proper value of K n .However, the basis function does influence the choice of K n .For example, in Exp3_Fourier indicates the optimal choice is 7 using the Fourier basis, but this increases to 15 using the B-spline basis.All three types of functional basis can detect element and group sparsity with a slight over-selection of individual scalar covariates using B-spline and wavelet, independent of the choice of K n .Exp3_Bspline exhibits a similar pattern, while Fourier and B-spline tend to over-select one functional covariate with FN f around 20%.A further inspection shows that the overselection is caused by choosing the lagged functional predictor Y t−2 (τ ) with small coefficients, possibly due to the propagation of the serial dependence.This choice of over-selection further affects the group detection accuracy because the functional predictor is considered a group, leading to a group with a false negative rate (gFN) of around 12%-31%.In general, the sparsity detection and prediction accuracy are insensitive to choice of basis function when K n is carefully selected.

Real Data Analysis
We apply the proposed pFAR model to predict natural gas flows.As a key energy source for Germany, natural gas contributes more than 20% to the country's entire energy supply.Efficient operation of the gas network requires a high-precision predictive model, which needs to be robust, able to deliver accurate forecasts for diverse types of nodes, and to provide insightful interpretation of essential driving factors.Our interest is in understanding the essential dependence structure of the large-scale framework and to conduct a forecast experiment to demonstrate the out-of-sample accuracy of the pFAR model.
There are three main types of nodes (locations) in the German natural gas pipeline network, each of which take on distinct functions.Municipal energy supplier nodes (labeled M) serve residential and local commercial constituents, power plant/industry distribution nodes (P) are used for electricity generation and factory production, and border point (B) nodes handle large natural gas flows imported and exported via Germany.The dataset contains a collection of daily supply and demand gas flow curves in the German high-pressure natural gas transmission network.Each curve is associated with high-dimensional mixed-type predictors, such as temperature, market price, renewable energy, and lagged gas flow curves.
The Fourier basis is used in the real data study, given its orthogonal property for an easy calculation of, for example, the integral of the product of basis functions, and the periodicity property observed in the daily gas flow curves.Given the fixed forms of the Fourier expansion, a new curve can be directly decomposed without recalibration in terms of the basis functions.We also conduct the analysis using B-spline and wavelet bases, which deliver similar performance, and are presented in the supplementary materials.We report the numerical results using only the Fourier basis in the following subsections.

Data
We consider six nodes, two for each type in the illustration (see Figure 4), for the daily flow curves over two years (1 October Y1 to 30 September Y3 1 ).There are different dynamics among daily flow curves.M nodes present weekly seasonality driven by household work routines.Spikes are seen in P nodes, which are often associated with emergent demand for electricity 1 The gas flows are normalized such that each node has zero mean and unit variance according to NDA generation using gas.B nodes are stable and less sensitive to seasonal shifts, given long-term contracts for stable gas supply between countries.Despite the individual features of trends and seasonality, there is always a significant serial dependence across all nodes, which can be shown by the sample cross-correlations included in the supplementary material.Seasonality motivates the incorporation of the past 7 flow curves (i.e., p = 7) as functional predictors.Each functional predictor was considered a single group.Functional time series are accompanied by many environmental and economic variables.The total number of exogenous scalar variables is d = 85, which includes three groups.The first group contains four market price values in different network areas: NCG, GASPOOL, TTF, and Zeebruegge.The second group has three renewable energy variables: solar, wind-onshore, and wind-offshore.There are daily temperatures at 78 locations.These locations are further divided into four geographic zones.Each zone is considered a group in the estimation.The scalar variables vary greatly; the maximum value of the price variable TTF, for example, is 2.90, and the maximum value of wind-onshore is 30, 158.To alleviate the impact of magnitude on tuning parameter choice and to provide an interpretable comparison among diverse types of nodes, the exogenous variables are normalized.See Figure 5 for the normalized observations of the 85 scalar predictors displayed in the three groups.Note that we use lagged values of the exogenous variable for the practical implementation of forecasting.
The two-year sample period (T = 730 days) was divided into three phases: the training phase (T 1 ), including the 365 days from 1 October Y1 to 30 September Y2; the validation phase (T 2 ), containing 183 days (approximately 6 months) from 1 October Y2 to 31 March Y3; and the forecasting phase (T 3 ), covering 182 days from 1 April Y3 to 30 September Y3.The tuning parameters (α * , λ * ) are selected over T 2 , as described previously.We then fixed the sparsity structure of the pFAR model and conducted forecasts in T 3 , where the sparsity structure and the degree of sieves remain the same, but the model coefficients are updated with all available historical data.
We tried different choices of K n and then select K n = 5 for all the nodes except for P-2 (K n = 1) in the same spirit as the simulation section.In the supplementary materials, we report the forecasting performance of the pFAR model using multiple values of K n , which justifies our choice and is consistent with our observations in the simulation study.Moreover, we noticed that for a large number of Fourier bases (e.g., 11), only the first five bases are active, whereas the remainders have zero or close to zero Fourier coefficients, which also supports our choice of K n .For the node P-2, we found that the intraday pattern of gas flow curves remains mostly flat throughout the day (see Figure 4), which is consistent to the choice of a small value of K n = 1 for this node.

Essential Factors
Among the seven functional predictors and the 85 scalar predictor variables, only a few predictors were selected by the pFAR estimation.They include the lag-1 gas flow curve, all four prices, Figure 6 displays the estimated functional coefficients of the seven lagged curves for node M-1 as illustrated.The results of other nodes are included in the supplementary materials.It shows that the lag-1 dominates the serial dependence with positive values, while the remaining lags are insignificant.Moreover, the fitted β1 (τ , s) is relative small when τ is between 8:00 and 17:00 and large during the rest of day.This is consistent with household routine.This implies that the serial dependence in M-1, a municipal node, is prevalent when a household consumes natural gas at homes.Note the estimated function coefficient is invariant with respect to s, which implies that particular time clocks of the earlier curve has little influence.
Figure 7 displays the estimated functional coefficients for the price and renewable energy variables again at node M-1, and the results of the other five nodes are included in the supplementary materials.The price covariates are positive, which makes sense as larger demands shift the prices higher.The effects of prices are however unrelated to the time domain τ , given the coefficient curves are generally flat.
Solar energy is negatively correlated with gas flows for all node types, showing that solar energy is a substitute for natural gas.The substitution effect is stronger at noon when solar power is the most abundant.Meanwhile, the small magnitude indicates that the substitution effect of solar power is still not strong, which is consistent with the fact that natural gas is one of Germany's major energy sources.The onshore and offshore wind energies are positive for M-type nodes but inactive for P and B nodes.Due to technical restrictions such as extreme weather, wind energy is less influential on natural gas flows, though it is statistically significant.
For temperatures, we display the first Fourier coefficients of the parameter functions only for visualization, given that    there are many temperature predictors and the first Fourier coefficients dominate the effect.Figure 8 presents the effect of the lag-1 temperature associated with the four geographical zones.The columns of the heatmap represent the gas nodes, and the rows represent the temperature locations.Except for the negative values in zone 1 for the M (municipal) nodes, the lag-1 temperature effect is null for the industry and border users P (production) and B (borders) nodes.This finding is economically meaningful.Households usually consume more natural gas when the temperature drops.In contrast, production and border flows are decided by the production schedule and long-term contractual agreements, for which temperature has an insignificant impact.

Forecast
We now continue a set of multi-period forecasting experiments.We consider the conventional one-day forecast as well as the h=2-and seven-day forecasts.In the evaluation step, we focus on the economically relevant discrete time clocks τ s ∈ [0, 1], s = 1, . . ., 24, of the trajectories of the forecasted curves.Specifically, we compute the h-day forecasts for each hour τ s using an iterative procedure.
where J ⊂ {1, . . ., 7} and D ⊂ {1, . . ., 85} represent the sets of active functional and scalar predictors in the fitted model, respectively.The βj and γ are the estimates at each time point t.
Figure 9 displays the 1-day forecast of the hourly gas flows over the forecasting period T 3 .Despite different node types and dependence features, the proposed pFAR model successfully captures the evolution of gas flows.There are some spikes in the P and B nodes, which we missed.However, we can see that the pFAR model in general produces a powerful and stable performance.
We measure forecast accuracy in terms of magnitudes, direction, and out-of-sample goodness of fit, which are calculated based on the original gas flow data rather than the normalized values.We compute the daily mean absolute percentage error (MAPE), mean sign correctness proportion (MSCP), and outof-sample R 2 (OoR 2 ) for the h-day forecast as Y t+h (τ s ) , MSCP , where t 0 = 548 and T = 730 indicate the starting and ending points of the forecast period, respectively, and Y t (τ 0 ) = Y t−1 (τ 24 ).Several alternative predictive models are considered for comparison.To make the tests comparable, we chose lag 7 consistently for all alternative models.The AR-type models include AR(1), AR(7), and ARX(7) (ARX hereafter), which estimate and forecast the gas flows at each hour independently.The Vector AR (VAR)-type models include VAR(1), VAR(7), and VARX(7) (VARX hereafter), considering the cross-dependence of the 24 hour natural gas processes.In addition to its own lagged values, the ARX and VARX models incorporate all 85 exogenous variables, with LASSO implemented for variable selection.The functional AR (FAR) type models, that is, FAR(1) and FAR(7), consider the serial-and cross-dependence of the gas flow curves without considering the exogenous predictors.Table 4 lists the specific formulas of alternative models.
For ease of comparison, the pFAR model was used as a benchmark.We compute the relative MAPE of an alternative model i, which shows the percentage change of model i against the benchmark A positive (negative) rMAPE i h represents a better (worse) performance of the pFAR against alternative model i.The significance of the model difference is measured using the Diebold and Mariano (DM) test (Diebold and Mariano 2002).
, where σ0 is the sample standard deviation, and σk is the autocovariance at lag-k of the series d h t with k ≥ 1.The null hypothesis H 0 : DM h = 0 refers to no significant difference between the two models.Table 5 reports the numerical performance of the pFAR forecasts for h = 1, 2, and 7 days ahead as well as their relative forecast accuracy compared with the alternative models.
The pFAR model delivers good out-of-sample forecast accuracy.The MAPE values are small, although they increase slightly along with the forecast horizons.Specifically, the MAPEs are in the range of [4.59%, 25.61%] for a one-day forecast, [5.51%, 19.45%] for two-day, and [6.70%, 23.37%] for seven-day.The OoR 2 values are convincing, mostly larger than 50%, except for P-1 and B-2 (h = 2, 7).The direction prediction is reasonable and consistent across forecast horizons in terms of the MSCP.In general, the forecasts for the B and P nodes are less impressive than those for the M nodes.This is likely because the power plant/industry nodes and border nodes are influenced by some maintenance plans and schedule changes, which are not  captured in our model.The pFAR model produces good and stable out-of-sample forecast accuracy across diverse types of nodes with various features and forecast horizons.The pFAR model is superior in considering cross-dependence, namely, the lead-lag dependence among multiple stochastic processes.The pFAR model also has the advantage of incorporating highdimensional exogenous predictors when compared with FARtype models.

Conclusion
We propose a pFAR to investigate the dynamics of a functional response with serial dependence on multiple lagged functional predictors and the associated relation on large-dimensional exogenous scalar predictors.Under a two-layer sparsity imposed on group and individual covariates, a regularized estimation is performed to detect the essential mixed-type predictors, achieve variable selection, and reduce estimation bias.A forward-looking criterion was proposed to select tuning parameters.Furthermore, we show the relationship between sieve parameter choice and estimation accuracy.We establish the asymptotic oracle inequality properties for the pFAR estimator and investigate its finite sample performance under both moderate and high-dimensional scenarios.The pFAR model accurately selects the active covariates and groups, and provides good estimation and prediction performance.
We apply the pFAR model to daily natural gas flows in a high-pressure gas pipeline network in Germany.We find that the lag-1 daily gas flow curve dominates serial dependence.The municipal energy supplier nodes are more sensitive to the weekly seasonality, price, renewable energy, and temperature, while the power plant/industry nodes and border nodes are mainly driven by serial dependence.By considering the sparse dependence driven by the large-dimensional mixed-type predictors, we show that the pFAR provides good and stable outof-sample forecast accuracy and is superior to several alternative models.Some extensions will be considered in the future research.First, as a referee pointed out, conditional on a given set of nonzero groups and elements and regularized assumptions, it is quite challenging to investigate group selection consistency and further study the central limit theorem of the pFAR estimator.Kong et al. (2016) proved the selection consistency and set up the asymptotic normality of scalar parameters in an IID framework with a scalar response.The theoretical derivation is an exciting direction for the functional time series.Second, the pFAR model could be extended to the multivariate functional time series case, as in Guo and Qiao (2020).It is also of interest to consider some nonparametric or semiparametric functions to the effect of scalar predictors, which will enhance modeling flexibility, especially in the case of nonlinear dependence.Nonparametric (vector) autoregression has been investigated and reviewed by Härdle and Tsybakov (1997), Franke et al. (2011), Vogt (2012), and Burdejová and Härdle (2019), which derived the estimation and inference of the nonparametric effect of lagged terms.Aneiros-Perez and Vieu (2008) and Kurisu (2021) proposed a nonparametric regression for a functional time series while only considered the nonparametric effect of single lagged curve or only with single predictor under an IID framework.The extension of the pFAR model with the nonparametric structure of large-scale scalar predictors, however, introduces challenges for model regularization and estimation.A less ambitious plan is to focus on the nonlinear dependence between a functional response and relatively low-dimensional mixed-type predictors.

Figure 1 .
Figure 1.Top panel: The coefficient operators β 1 (τ , s), β 7 (τ , s), and β 14 (τ , s); Bottom: The pFAR estimates with the sieve parameter K n = 5 for the high-dimensional configuration in Experiment 1.For scalar variables, when d = 20, we fix m = 5 with an equal group size of 4. We assume that the first 10 scalar predictors are active, and γ (τ ) = 0 for > 10.For the highdimensional scenario with d = 200, we set m = 20 with an equal group size of 10, and the first 25 predictors are active.The nonzero coefficients γ (τ ) = 5 k=1 ,k φ k (τ ) are generated with coefficient ,k and uniformly distributed in [− √ 2, √ 2] for k = 1, . . ., 5. Finally, for the innovations, we set ε as a diagonal matrix with diagonal elements randomly uniform in [0, √ 3/3] to control the signal-to-noise ratio.Given the generated data, we treat the degree of sieves K n as a hyperparameter and consider K n ∈ {1, 3, . . ., 11, 15, 21}.In this experiment, we adopted the Fourier basis for estimation.
average values of the sparsity detection rate, estimation accuracy, and performance accuracy are reported for different choices of K n .The standard errors of the parameter estimation and in-sample and out-of-sample accuracies are reported in parentheses.

Figure 2 .
Figure 2. The plot of estimated functional coefficients γ1 (τ ) and γ2 (τ ) for the first two scalar covariates under K n = 5 for moderate (top) and high-dimensional (bottom) configurations of Experiment 1.
β j (τ , s) are used to estimate the special case with a reduced form of one-dimensional coefficients γ (τ ).Although misspecified, the pFAR delivers stable accuracy for the special case once K n ≥ 5.As an illustration, the bottom panel of Figure1displays the estimated functional operators β1 (τ , s), β7 (τ , s), and β14 (τ , s) in Exp1_H, and Figure2displays the estimated functional coefficients γ1 (τ ) and γ2 (τ ) for the first two scalar covariates under K n = 5 in Experiment 1.Both suggest a similar pattern to the true coefficients.The estimated functional coefficients of nonzero β j (τ , s) for Exp1_M and Exp2_H are included in the supplementary materials, which also present similar patterns with the true coefficients.

Figure 3 .
Figure 3.The plots of parameter estimation and prediction accuracy with various K n in Experiments 1 and 2.

Figure 4 .
Figure 4. Daily gas flow curves at six representative nodes from 1 October Y1 to 30 September Y3.The displayed data were normalized.

Figure 7 .
Figure 7.Estimated functional coefficients for the four price and three renewable energy variables at node M-1.

Figure 8 .
Figure 8. Heatmap of estimated effects of temperature (the estimated first Fourier coefficients of parameter functions corresponding to temperature variables) on gas flows, where the rows are gas nodes and columns are zones, with blue (red) indicating negative (positive) effects.

Figure 9 .
Figure 9.The hourly gas flow observations (dashed line) and the one-day forecasts (solid line) at the six nodes from hour 1:00 on 1 April Y3 to hour 24:00 on 30 September Y3.The data are displayed as normalized values.
NOTE:The average values of the sparsity detection rate, estimation accuracy, and performance accuracy are reported for different choices of K n .The standard errors of the parameter estimation and in-sample and out-of-sample accuracies are reported in parentheses.

Table 5 .
Forecast accuracy: MAPE, Out-of-sample R 2 , and MSCP of the pFAR model, and the relative MAPE of the alternative models for 1−, 2−, and 7−day forecasts of gas flows.The best-performing model (measured by the smallest MAPE) is indicated in bold.One and two asterisks represent a significant forecast accuracy difference of the pFAR model compared to alternatives according to the DM test at the 10% and 5% significance levels, respectively.The positive (negative) value of rMAPE represents better (worse) performance of pFAR, and this amount corresponds to the percentage change of the alternative model compared to pFAR.