LIC criterion for optimal subset selection in distributed interval estimation

Distributed interval estimation in linear regression may be computationally infeasible in the presence of big data that are normally stored in different computer servers or in cloud. The existing challenge represents the results from the distributed estimation may still contain redundant information about the population characteristics of the data. To tackle this computing challenge, we develop an optimization procedure to select the best subset from the collection of data subsets, based on which we perform interval estimation in the context of linear regression. The procedure is derived based on minimizing the length of the final interval estimator and maximizing the information remained in the selected data subset, thus is named as the LIC criterion. Theoretical performance of the LIC criterion is studied in this paper together with a simulation study and real data analysis.


Development of distributed estimation
Broad use of social networks, e-commerce and mobile internet in daily business and life has generated mountainous amount of data that contain rich information for informing better business decisions and social development. This also brings challenges on storing, processing and analyzing the big data. Distributed data storage and parallel data analysis provide a natural approach to tackle these challenges. By this approach, the big data are distributed over a set of connected computer servers or on cloud. Statistical analysis can be performed in parallel to the data subsets stored in those servers. Results from each subset data analysis are then aggregated to provide the final result. An example of this distributiveparallel approach, named as the distributed estimation process, is illustrated in Figure 1.
Regarding the study of distributed data, Zhang et al. [25] considered the divide-andconquer algorithm and studied two effective communication algorithms for distributed statistical optimization in big data. Kannan et al. [12] conducted an in-depth study on principal component analysis and higher correlation of distributed data, and proposed two methods to tackle these two problems using the observed massive data. Chen and Peng [5] developed a class of symmetric statistics to summarize the distributed data subsets, with which a distributed statistical method is provided and implemented in parallel computing setting.
For the study of distributed estimation, we can refer to Lee et al. [13], Battey et al. [1], Minsker [17], Jordan et al. [11], Huang et al. [9] and Wang et al. [22], among others. For example, Rosenblatt and Nadler [21] analyzed the error of the average estimator and gave an asymptotically accurate expression of it. Huang et al. [9] proposed a one-step average estimation method. Compared with the simple average, this method has the advantage that only one round of communication is required. Battey et al. [1] studied hypothesis testing and parameter estimation involved in the divide-and-conquer method, where they solved the problem of how to choose the subset number K n as the number of data points n increases in low-dimensional and high-dimensional data settings, and provided a theoretical upper limit of K n .
In summary, distributed estimation in big data is shown to be effective for processing massive data in general. As shown in Figure 1, with this method we divide all data into several smaller data blocks and store them on multiple platforms in different locations. Next, statistical inference is performed on each data subset, and a local estimator is obtained. The final step is to aggregate all estimators to form an aggregated estimator. However, the results from distributed estimation may still contain redundant information about some population characteristics of the data under analysis. Distributed inference may be used to improve the efficiency for processing massive data by reducing this redundant information to minimal.

Optimal subset selection problem in distributed estimation
A particular distributed inference method to reduce the aforementioned redundant information is optimal data subset selection, i.e. the final statistical inference result is obtained from performing statistical estimation on an optimally selected data subset only. Qian et al. [20] proposed the POSS method, which used evolutionary Pareto optimization to find small data subsets having good performance. Ma et al. [14] proposed a leverage method to select a small data subset with the statistical estimation applied to the selected subset to substitute the statistical estimation on the whole data set. Mirzasoleiman et al. [18] considered the problem of maximizing distributed sub-module functions and developed a simple two-stage strategy for the maximization. The strategy can be easily implemented in parallel setting. For general distributed inference, we refer to Boyd et al. [2], Deisenroth and Ng [6], Ma et al. [15], Qian et al. [19], and Wang et al. [23], Ma et al. [16], among others.
When the underlying statistical estimation problem is confidence interval estimation in linear regression with big data, minimizing the length of the interval can be used as a criterion for optimal data subset selection. By using an adaptive confidence interval method, Hoff [8] obtained a shorter average confidence interval of regression coefficients than that from the usual confidence interval procedure. Cai and Guo [3] established the convergence rate of the minimum expected length of the confidence interval and studied the adaptability of the sparsity of the interval. In this paper, we develop a new optimal data subset selection criterion using the principles of not only minimizing the expected interval length but also maximizing the information left in the selected data subset. In addition to assessing the performance of the new criterion by a simulation study, we also apply it to analyze three real data sets below: • Gas turbine NOx emission: The response variable is NOx emission, and the predictors include nine variables. In this application, the whole data consist of parts from many distributed sensors. • Airfoil self-noise: The response variable is the scaled sound pressure level, and the predictors include five variables. The whole data consist of parts from many different anechoic wind tunnels. • Real estate valuation: The response variable is the price per unit area, and the predictors also include five variables. The whole data comprise parts from many dispersed valuation companies.

Outline of our work
We focus on effective and efficient confidence interval estimation in linear regression analysis when the available data are massive and stored in a distributive manner. First, suppose the whole data matrix Q = (X, Y), where Y is an n × 1 response vector and X is an n × p covariate matrix, consist of K n row-wise subsets stored on K n machines separately. The K n subsets are denoted as where I k is an index vector indicating which rows of Q comprise Q I k . We can rewrite X and Y as ⎛ Next, a linear regression model is fitted to each data subset Q I k in parallel, with each fit's interval estimation result being fed into the central server for post-regression inference. Figure 2 illustrates this distributed data analysis process. Since the information contained in Q I k 's may be repetitive or redundant, there is a need to aggregate the results from all data subset to filter out the repetitive or redundant information as much as possible. The linear regression model for the whole data in (1) is of the form Y = Xβ + ε, where β is the unknown regression coefficient parameter vector to be estimated from the data. We aim to compute an efficient confidence interval for certain population characteristic of the data that is related to β. For simplicity of presentation, we focus on computing an efficient confidence interval for a generic quantity x β which is the mean of Y at covariate x.
The main contributions are mainly reflected as follows. First, we develop a new approach for the redundant information problem, called as the LIC criterion, to find the best data subset for computing an efficient confidence interval for x β that is not only short but also informative to certain optimal sense. The LIC criterion of optimal subset selection is different from the IBOSS criterion proposed by Wang et al. [22] that cannot guarantee the minimum length of the interval estimator. Second, we present two optimal bound properties on the mean length of the interval estimator from the selected optimal data subset. Third, we examine the stability and sensitivity of the LIC criterion in distributed interval estimation across different setups, showing that the LIC criterion is effective and efficient in both simulation study and real big data analysis.
The rest of this work is organized as follows. Section 2 presents the framework of distributed interval estimation in distributed linear regression and involved computational complexity. Section 3 presents the LIC criterion for optimal subset selection in distributed interval estimation and its properties. In Section 4, we validate the proposed criterion for optimal subset selection with numerical experiments. In Section 5, we discuss the subset/interval length issues and the relevant challenges. Section A presents the proofs of the related theorems. The R code scripts for implementing the LIC criterion are given in the supplementary materials section.

Distributed interval estimation
In this section, general distributed analysis framework is introduced, including distributed linear models and distributed interval estimation, together with the involved computational complexity.

Distributed interval estimation in distributed linear models
Here we shall focus on distributed linear regression model of the form where X I k is an n I k × p sub-matrix of X with p < n I k , ε I k is an error sub-vector, and I n I k ×n I k is an n I k × n I k identity matrix. Let B ⊆ R p×1 be the space of values that the estimator β = (β 1 , . . . ,β p ) can take, whereβ is obtained on the whole data (X, Y). And σ 2 is an unknown variance. Note that Y = (Y I 1 , Y I 2 , . . . , Y I Kn ) and X = (X I 1 , X I 2 , . . . , X I Kn ) . Models of the form (2) can also be combined as where Y is a random response vector of observed values with Averaging all these least squares estimators gives a simple average estimatorμ (a) (x), which equalsμ The variance ofμ (a) (x) can be found to be One can find a weighted average estimator for μ(x) that has a smaller variance than var(μ (a) (x)). Definê It can be shown that knowing that var(μ w (x)) is the harmonic mean of var(μ I k (x))'s divided by K n and var(μ (a) (x)) is the arithmetic mean of var(μ I k (x))'s divided by K n . Minsker [17] proposed a median processing method for the central processor in a closely related work, given bŷ which is referred as a median-of-least-squares estimator.
Our goal is to find the best data subset(s) from Q I k 's and use it to obtain a confidence interval C(Y I k |X I k , x) (written as C(Y I k ) for simplicity of presentation) for the mean μ(x) of a given 1 − α confidence level, based on (Y I k , X I k ). That is, Similar to Yu and Hoff [24], we define w(·) as a to-be-determined w-function (or scale process) taking values in (0, 1). Write A w (μ(x)|Q I k ) as the acceptance region for finding a confidence interval for μ(x) based on data subset Q I k . It is a form of Scheffé interval, see also Casella and Berger [4], satisfying that whereσ 2 I k is an unbiased estimator based on Q I k , t n I k −p,α is the level-α critical value of the t-distribution with degrees of freedom n I k − p, and More specifically, we obtain a confidence interval by inverting the level-α acceptance region. When the function w = 1/2, the confidence interval of mean μ(x) based on Q I k can be found to be For the full rank sub-matrix X I k X I k , we have In special cases, when X I k X I k is irreversible, we may use the ridge regression idea to set where λ is the disturbance term.
The shortest level 1 − α confidence interval for μ(x) based on Q I k has its length given by Furthermore, when n I k is large enough, we use N(0, 1)-distribution to approximate the t-distribution whereby the level 1 − α confidence interval for μ(x) based on Q I k has its length approximated by cf. Javanmard and Montanari [10] and Zhang and Zhang [26].

Computational complexity
In the process of distributed estimation, one of the important practical problems we must consider is the choice of K n . For example, Chen and Peng [5] discussed a symmetric statistic T n,K n obtained from aggregating estimation results from all K n data subsets and computed its variance. Their research results show that the larger K n , the larger is the variance of T n,K n . In addition, lengths of each data subset also affect the computational cost. Chen and Peng [5] proposed two distributed bootstrap algorithms and compared them with distributed bootstrap and pseudo-distributed bootstrap. They concluded that increasing K n will reduce the computational cost, but will result in the loss of statistical efficiency. Therefore, in practice, the choice of K n should be considered from the perspective of statistical accuracy, computational cost and feasibility. A factor that must be considered when performing distributed estimation is computational complexity. The complexity is generally expressed in terms of notation O. In practice, we generally only consider the worst-case complexity; that is, its maximum running time as a function of n. We use the function φ(·) to denote the exact computing complexity. Therefore, in distributed estimation, an estimate on computational complexity O(φ(n)) corresponds to a machine parallel operation with the complexity O(φ(n/K n )), cf. Battey et al. [1]. Early research works on distributed estimation including Battey et al. [1], Minsker [17], and Lee et al. [13] suggest K n = O( √ n) achieves the optimal computational complexity.
A good distributed estimation should also consider the impact of communication costs. Zhang et al. [25] and Lee et al. [13] studied a one-shot parallel method so that only one round of communications was used. In the estimation we studied, there are K n nodes that transmit information to the central server. Therefore, the communication cost of each machine is propositional to ψ(K n ) with given ψ(·).

LIC criterion for optimal subset selection
In this section, we undertake to derive the length and information optimization criterion (LIC) for optimal subset selection.

LIC criterion in distributed interval estimation
For fixed level of confidence 1 − α, the confidence interval estimator is more efficient when it is shorter. Thus minimizing the confidence interval length can be used as a criterion for optimal data subset selection. With this observation, a data subset Q I k , which may be represented by its index I k , would be an optimal data subset if it minimizesσ I k · C I k ,x · t n I k −p,1− α 2 . But the minimizer here may vary with x. Hence we replace C I k ,x inσ I k · C I k ,x · t n I k −p,1− α 2 with a kind of its expectation over all x ∈ X I k , i.e. n −1 , and then define the optimal data subset I 1 opt as whereσ I k , C I k ,x and t n I k −p,1− α 2 are as in (12), and H I k is given before (12).
Note that the least squares estimator of β based on This suggests we can maximize the determinant of information matrix X I k X I k (based on the data sub-matrix Q I k ) to find another optimal data subset I 2 opt such that This is similar to the D-optimality criterion given in Wang et al. [22], where i data points from (X I k , Y I k ) are selected to form a data subset, so as to maximize the following: is included in the data subset and δ k = 0 otherwise. Since the confidence interval of μ(x) obtained from a combination of two data subsets will be shorter and contain more information than that obtained from each data subset, we propose the optimal data subset I opt be the intersection of I 1 opt and I 2 opt obtained from (13) and (14), i.e.
By using (15), the optimal data subset Q I opt = (X I opt , Y I opt ) can be readily chosen from (X, Y). This approach of finding the optimal data subset by minimizing the relevant confidence interval length and maximizing the relevant Fisher information is named as LIC criterion. By the LIC, the confidence interval of μ(x) based on I opt is given by The implementation process of the LIC for optimal subset selection is illustrated in Figure 3.

Properties of the LIC criterion
We present two bound properties of L(C(Y I opt )) in (16) after giving several related notations. For α ∈ (0, 1) and B ⊆ R p×1 , we have Here, B I k is the value space forβ estimated from Q I k . The expected length with the prior π B opt on B opt is defined by Denote observation values corresponding to random variables Q and Q I k as q and q I k , respectively. Also denote f π B I k , f π B opt , and f π B {−opt} as the density functions of the corresponding marginal distributions. f β (·) denotes the density function of Y on given parameter vector β ∈ B. More specifically, The total variation distance between the density functions f π B {−opt} and f π B opt is given by Theorem 3.1: Assume that μ I opt ∈ C(Y I opt ) and μ I {−opt} ∈ C(Y I {−opt} ). Then we have, for any k, By using the probability inequalities of μ I {−opt} , μ I k , and their expected lengths, we have the above theorem. The bound of log L(C(Y I k )) is also an important factor. By using several given conditions, we have the following theorem on log L(C(Y I k )).

obey lognormal distributions instead of Chi-square distributions, L M (C(Y I k )) is the maximum among {L(C(Y I k ))} K n k=1 . There exists a constant c (> 0) such that E(log L(C(Y I k ))) ≤ (1 + c) log L(C(Y I opt )). (18)
Then there exist a constant C (> 0) and a positive integer N such that, for K n > N, By using a related variance lemma, Chebyshev's inequality, and the bound properties of E(log L(C(Y I k ))) and L M (C(Y I k )), we obtain the theorem.

Numerical analysis
In this section, we examine the performance of the proposed LIC criterion both in simulation and in real data analysis. We have implemented the LIC criterion into an R package called LIC [7], which is available at https://CRAN.Rproject.org/package = LIC.

Preparation
By using these three index subsets I 1 opt , I 2 opt , and I opt , we havê μ I = X I β I for I = I 1 opt , I 2 opt , or I opt .
We choose mean square error (MSE) and mean absolute error (MAE) as two performance indicators. These indicators are generally used to detect the deviation between the predicted value and the true value. Generally, the larger the value, the worse the prediction effect.
For the sub-estimator of μ I k ,μ I k , the MSE associated with the prediction error is defined as Such that we can get the one-shot mean MSE ofμ (a) is given by The one-shot median MSE ofμ (m) is given by Next, the MAEs of these five estimators are respectively defined by MAE μ (a) = min

Stability experiments
We consider the stability of our LIC criterion when Q 1 opt and Q 2 opt take the intersection. We resample the data set for (n, p) = (10, 7), to generate new data set whose length is 75, and divided them into 5 subsets on average. The optimal subsets were selected and the unions were selected for the two subsets. The detailed results are shown in Table 1. In the table, the data subset with bold black marks are the set in both subsets, and at this time, exactly the length of it is 7. Yet, it can be shown that the method of taking the optimal subset of intersection is meaningful.
We now study the optimal subset when Q 1 I opt and Q 2 I opt do not take intersections. We choose the length n = 1000. For the sake of convenience, we randomly divide the total data set into K n = 5 subsets of equal length. In the case of p = 10 and 1 − α = 0.95. Table 2 shows comparison results of choosing the optimal subset in simulation. The second and third rows of the table are the results of the optimal subset based on Equations (13) and (14), respectively. It can be seen from the table that only in the first and third simulations, the sequence numbers obtained by these two methods are the same, and the subset Q k where k = 1 is the optimal subset. There are two reasons for the obvious difference between the

Figure 4. MSE values and MAE values of five mean estimators over K n in simulation.
two methods. One is that the data set length n is too large, and the other is that Q 1 I opt and Q 2 I opt do not take the intersection. The data set (X, Y) is generated from the model (3). X and β are n × p matrix and p-dimensional vector, respectively. Setting (n, p, σ 2 ) = (1200, 5, 1), we randomly divide (X, Y) into the subset sequence {(X I k , Y I k )} K n k=1 for K n = {10, 15, 20, 25}. The optimal subset method of distributed estimation is applied to the data set and compared with other four methods. Figure 4 shows the comparison results ofμ (a) (One-shot mean) in (5), μ (m) (One-shot median) in (7) Next, we turn our attention to the length of the sub-mean μ I k . We observe that under the same condition of K n , the values of L(C(Y I opt )) are becoming smaller and smaller, so it is easy to converge. This is consistent with Theorem 3.1. For Theorem 3.2, when K n = 30 and 40, there are the constant c in (18) and the constant C in (19). Then, when K n = O( √ n), there must be a constant C ∈ (1.652, 1.898) to make the above formula true in Table 3. Table 4 presents sequence numbers of optimal subsets over four different lengths K n in simulation. For K n = 10 and 20, the lengths of the optimal subset are all 14. At the same time, for K n = 15 and 25, the length are all 6.   (15) 709 669 161 210 186 614 Q Iopt (20) 597 551 1104 1197 344 6 853 1140 557 1013 675 309 38 920 Q Iopt (25) 387 542 904 138 1093 1003  (3), we now consider the sensitivities of our LIC criterion by discussing the following three scenarios. Scenario 1: Effect of K n and n We vary K n = 5, 10, 15, 20, 25 and n = 1500,3000,4500,6000,7500 for fixed p = 4. For the effect of K n and n, the comparison results of the LIC criterion are presented in Panel (a) of Figure 5. The experiment results are the average values of ten times. As shown in Panel (a), when p is fixed, most of the lines reach the lowest points at n = 4500 (K n = 5, 20, 25), and there is the smallest error and better estimation performance when K n = 25.  As shown in Panel (c), the MSE values gradually decrease as K n increases with fixed n, and the MSE values are more stable for related big p.

Sensitivity experiments
To summarize, the simulation results show that the LIC criterion is effective, and the optimal subset obtained by taking the intersection is meaningful and feasible.

Real data analysis
In this section, we analyze two indicators (MSE and MAE) to measure the performance of the LIC criterion in three real data sets (gas turbine NOx emission data set, airfoil self-noise data set, and real estate valuation data set).

Gas turbine NOx emission data set
With the gradual increase of environmental protection pressure, the emission standards of thermal power plants have become more and more stringent, and the research on pollutant emissions of gas turbines has become more important. To predict NOx emissions, we use the gas turbine NOx emission data set in UCI database, which contains 36,733 instances of 11,733 sensor measurements. Note that Y i is the output variable, NOx; that is, the response variable. The pollutant emission factors of gas turbines include 9 variables. We select 7,200 data points in 2015. Then the model can be expressed as To perform distributed estimation, we divide these data points into K n subsets for K n = {10, 15, 20, 25, 30}. The above five methods were used to estimate the NOx data, the MSE values and the MAE values of them were obtained. Figure 6 shows specific results of the different mean estimators over different K n . As shown in Figure 6 andμ (m) reach the minimum. Tables 5 presents the optimal subsets of gas turbine NOx emission data set over different K n . In Table 5, the length of the optimal subset for K n = 40 is 9, the smallest while the  length for K n = 10 is 13, the longest. Furthermore, the length for K n = 50 is 10, the length for K n = 30 is 11, the length for K n = 20 is 12.

Airfoil self-noise data set
The second real data is the airfoil self-noise data set from the NASA data set in UCI database. The data set contains 1503 data points, including 6 variables. Among them, the scaled sound pressure level is the dependent variable and the other five are independent variables. Therefore, p = 5. Our linear model is given by For the convenience of computation, we randomly select n = 1500 and divide them into K n subsets, K n ∈ K = {5, 10, 20, 30, 50, 100, 150}. We apply the optimal subset and other four methods of distributed estimation to K n subsets, and compare the MSE values and MAE values ofμ (a) ,μ (m) , μ I 1 opt , μ I 2 opt , and μ I opt . Figure 7 shows the MSE values and MAE values of those five estimators over different K n . In Panel (a), the MSE values ofμ I opt are much smaller than those of other four estimators, and it tends to decrease as K n increases. μ I 1 opt andμ (a) follow,μ I 2 opt andμ (m) are bigger. In Panel (b), the MAE values ofμ I opt are generally the smallest. Therefore, the optimal subset estimation performance is better for the data set.
Next, we compute L(C(Y I opt )) under the LIC criterion and find that its values become smaller and smaller, so it is easy to converge. This is consistent with Theorem 3.1. From Theorem 3.2, we conclude that when K n = {30, 50}, there are c in (18) and C ∈   (19) to make them true in Table 6. Tables 7 presents the optimal subsets of airfoil self-noise data set over different K n . In Table 7, for K n = 10 and 150, the lengths of the optimal subset are all 10. For K n = 30, the length is 7, the smallest. However, for K n = 100, the length is 14.

Real estate valuation data set
The model of real estate valuation is a regression model, the data set is from Xindian District, New Taipei City, Taiwan. Real estate valuation data set contains information about 414 real estate prices of 5 independent variables. The dependent variable is the price per unit area. The model is the same as the airfoil self-noise data set. To facilitate distributed computing, we randomly select n = 400 and divide them into K n = {4, 5, 8, 10} subsets to examine the estimated performance of μ I k under the five methods, see Figure 8 and Table 8.
From Figure 8, we can clearly see that the MSE values ofμ I opt are the smallest, followed byμ I 1 opt ,μ (a) ,μ (m) , and those ofμ I 2 opt fluctuates greatly with the increase of K n . Table 8 shows the MAE values of five estimators over different K n . We can see, the MAE values ofμ I opt are the smallest,μ I 1 opt andμ (m) follow. Therefore, for the real data set, the performance of μ I opt has relatively good performance.
Next, we computed L(C(Y I opt )) for the LIC criterion in the data set. As we expected, its values are becoming smaller and smaller, so it is easy to converge. This is consistent with Theorem 3.1. For Theorem 3.2, when K n = O( √ n) and there are c ≥ 0.737 and C ≥ 0.115. Tables 9 presents the optimal subsets of real estate valuation data set over different K n . In Table 8. MAE values of the estimators over different K n in estate valuation data set.     Table 9, as K n increased, the length of the optimal subset first decreases and then increases. In particular, for K n = 8, the length is 6, the smallest.

Conclusion
For distributed processing of big data, it is important to note that the subset length is not fixed. In distributed setting, it is more important to choose the length of the fixed subset. For example, in the hypothesis test and parameter estimation problems studied by Lee et al. [13], the fixed subset method is adopted to randomly and uniformly divide total data set into K n disjoint subsets. In addition, random subset methods are becoming more common.
In this work, we study the optimal subset selection problem for distributed interval estimation based on fixed subset processing method. In the future, there are still many challenging directions for us to study. For example, the optimal subset selection problem under random block method. In practice, however, we need to select different models for further analysis based on different problems and data lengths. In addition, whether the length of the data subset will affect the research results needs further verification.
For distributed form of big data, many nodes each undertake different parts of the same work in parallel, to complete a work and improve efficiency. The simulation results show that we have superior performance for the criterion of optimal subset selection. The theory of trying to find the optimal subset can be used as our future research direction.

Supplementary materials
The related R code of the LIC criterion is presented in simulation and real data analysis.
Suppose that the data set Q is with a distribution P β where the parameter β belongs to the parameter space B. Here B = K n k=1 B I k = B opt ∪ B −opt . Let C(Y I k ) be the confidence interval with the guaranteed coverage (1 − α) over the parameter space B for k = 1, . . . , K n . The parameter spaces corresponding to C(Y I opt ), C(Y I {−opt} ), and C(Y I k ) are B opt , B −opt , and B I k , respectively. Let π B I k , π B opt and π B {−opt} denote the prior distribution supported on B, P π B I k , P π B opt and P π B {−opt} denote the marginal distribution. Note that the maximum expected length over μ I opt has an upper bound on π B opt . We have max β∈B opt E β L(C(Y I opt )) ≥ β∈B opt L(C(Y I opt )) π B opt (β) dβ = E π B opt L(C(Y I opt )).
It is shown that log L M (C(Y I k )) is bound, there exists N 1 such that, for K n > N 1 , log L M (C(Y I k )) ≤ ε + E(log L(C(Y I k ))).
Note also that the expectation of log L M (C(Y I k )) satisfies E(log L(C(Y I k ))) ≤ (1 + c) log L(C(Y opt )).