Optimal Experimental Designs in the Flow Rate of Particles

This article focuses on analyzing the process of jam formation during the discharge by gravity of granular material stored in a two-dimensional silo. The aim of the article is two-fold. First, optimal experimental designs are computed, in which four approaches are considered: D-optimality, a combination of D-optimality and a cost/gain function, Bayesian D-optimality, and sequential designing. These results reveal that the efficiency of the design used by the experimenters can be improved dramatically. A sensitivity analysis with respect to the most important parameter is also performed. Second, estimation of the unknown parameters is done using least squares, that is, assuming normality, and also via maximum likelihood assuming the exponential distribution. Simulations for the designs considered in this article show that the variance, the mean squared error, and the bias of the estimators using maximum likelihood are in most cases lower than those using least squares. Supplementary materials for this article are available online.


INTRODUCTION
Material in granular form is a collection of macroscopic solid similar particles, which interact with each other by forces of contact. The size of the granular material ranges from millimeters or centimeters (e.g., grains of sand or seeds) to meters (e.g., rocks). This granular material is usually stored in silos (see Figure 1 as an illustrative model in two dimension). A realworld (three-dimensional) example is the transport of granular material in the mining industry through a vertical tube. During the discharging process caused by gravity, the flow of granular material can be interrupted due to the formation of an arch at the outlet ( Figure 1). An arch is a structure where the particles are mutually stable. The formation of an arch causes a jam. If one of the particles that form an arch is removed, the arch collapses under the effect of gravity. After the formation of an arch, an input of energy (blowing, shaking, or tapping) is necessary to break the blocking arch and restart the flow. Then the grains fall again until a new arch is formed. The event of breaking the arches may be dangerous, expensive, or just difficult, as happens in the case of the mining industry. An avalanche can be defined as the amount of granular material that falls between two successive jamming events. The waiting time between two successive jamming events can be used to measure the extent of an avalanche.
In this work, we focus on analyzing the process of jam formation and creation of avalanches during the discharge by gravity of granular material stored in a two-dimensional silo. When the silo is filled with granular material, grains pour freely from the outlet due to gravity (see Figure 1). In this article, the average size of avalanches, S, is measured as the waiting time between two successive jamming events. Following (Janda et al. 2008), we shall assume that S has an exponential distribution with mean and variance where φ is the diameter of the outlet and φ C is the theoretical critical diameter above which jamming is not possible. The parameter A corresponds to the value of S when φ = φ C − 1 and γ is the rate of exponential decline. The vector of unknown parameters, θ = (γ, A, φ C ), has to be estimated from the data collected on S.
In the literature, least-square estimators (LSE) are computed for this problem as if the data were Gaussian, neglecting the exponential distribution of the data. In this article, maximum likelihood estimators (MLE) for this distribution are computed and compared to LSE (MLE assuming a Gaussian distribution). We need data points (φ i , S i ), i = 1, . . . , n to perform the estimation. To generate such data, we need to choose the input points φ within an interval [φ L , φ U ]. Earlier, this has been done in a rudimentary form (e.g., Janda et al. 2008). In this article, we propose a methodology for optimal selection of these points with the objective of achieving maximum possible efficiency in parameter estimation.
This article has been organized as follows. In Section 2, an introduction to relevant aspects of Optimal Experimental Design theory is provided. In Section 3, the best sample points to run experiments for estimating the vector of unknown parameters of the model given by (1) are computed using the D-optimality criterion. In Section 4, a compound criterion is considered to include the cost of the experimentation in the problem. In Section 5, a sensitivity analysis is performed in two ways. On the one hand, the relative efficiency of the design obtained is evaluated for possible changes in the most important parameter of the model. On the other hand, a comparative study of the LSEs and MLEs is made. Finally, in Section 6, Bayesian and sequential approaches are considered to compute D-optimal designs.

OPTIMAL EXPERIMENTAL DESIGN THEORY
An exact experimental design of size n is a collection of values of the explanatory variable, φ 1 , . . . , φ n , in a defined design space where some values may be repeated. In the example considered in this article, φ L and φ U are the lower and upper extremes of the possible outlet diameters of the silo, satisfying ρ < φ L < φ U < φ C where ρ is the diameter of the granular material. Assuming there are just k different points in the design and they are, without loss of generality, φ 1 , . . . , φ k , then a probability measure can be defined with 0 ≤ p i ≤ 1 being the proportion of times φ i appears in the design (i = 1, . . . , k).
Thus, the design can be denoted as where k i=1 p i = 1. This suggests a generalization of the concept of experimental design to any discrete probability measure with finite support on the design space, where ξ is the probability mass function associated. This is called an approximate design.
A major advantage of the optimal experimental design (OED) is the cost reduction of experimentation by allowing statistical models to be estimated with fewer experimental runs. For computing optimal designs, the Fisher information matrix (FIM), or simply information matrix, is used because its inverse is asymp-totically proportional to the covariance matrix of the estimators of the parameters under some regularity conditions.
The information matrix for a design ξ is defined as is the FIM at point φ i , and L(θ ) is the log-likelihood function.
By Caratheodory's theorem (see Fedorov 1972, p. 66 for details), it is known that for any design there is always another design with at most m(m+1) 2 + 1 different points in its support such that their information matrices are the same, where m is the number of unknown parameters to be estimated. This helps the search for an optimal design and makes it realizable in practice.
The FIM for a nonlinear model, as in this case, depends on the value of the vector of unknown parameters θ . Some of the solutions proposed in the literature for this inconvenience include the use of locally optimal designs for some nominal values of the parameters, Bayesian designs for some prior distribution of the parameters, or sequential designs improving the estimators of the parameters during the process. These three methods are considered in this article.
Let be a real-valued function bounded from above defined on M = {M(ξ ) : ξ ∈ }, where is the set of all possible approximate designs, that is, the set of finite probability mass functions on the design space. The optimal design problem is concerned with finding ξ * such that (M(ξ * , θ)) = min ξ ∈ (M(ξ, θ)), which is called the -optimal design. For simplicity, we write (ξ ) instead of (M (ξ, θ)).
In this work, D-optimal designs are computed. D-optimality is the most popular criterion that minimizes the volume of an approximate confidence ellipsoid of the vector of unknown parameters θ . This volume is inversely proportional to the square root of the determinant of the information matrix. Then, this criterion is defined by where |·| stands for the determinant.
When the experimenter must pay attention to two goals represented by two convex criterion functions, a compromise criterion can be expressed as a single compound criterion based on a weighted average of the two criterion functions, say 1 and 2 , An important result, the General Equivalence Theorem (GET; see e.g., Atkinson, Donev, and Tobias 2007, sec. 9.2 for details), gives a tool for checking whether a design is optimal or not under such optimality criteria. Additionally, it provides a powerful tool for algorithmic computation. Let ψ(φ, ξ, θ) be the Frechét' directional derivative in the direction of a one-point design at φ. This function is frequently called the sensitivity function. The GET states that under some conditions of the criterion function, ψ(φ, ξ, θ) achieves its minimum value, zero, at the support points of the optimal design. In particular, the sensitivity function for D-optimality is The -efficiency measures how good an arbitrary design is with respect to the optimal design, If is positive homogeneous, the efficiency has a practical interpretation. For instance, if the efficiency of a particular design is 60%, then 40% of the experiments would be saved from the optimal design to get the same statistical inferences. As mentioned above, the FIM depends on the nominal value of the vector of unknown parameters θ , and so does the criterion function. Prior knowledge on the possible values of the vector of unknown parameters of model (1) can be used to deal with this problem. Locally optimal designs are computed considering specific nominal values of the parameters and doing the optimization according to the criterion chosen.
For a Bayesian approach, the prior information on the vector of unknown parameters, θ , is represented by a prior distribution, π (θ ). Chaloner and Verdinelli (1995) provided a general overview of different Bayesian optimality criteria used in the literature giving a general structure of the problem. Using the approach given by Lindley and Smith (1972, pp. 19-20), the problem is reduced to finding the design that maximizes the expected utility of the experiment outcome. The optimal design problem is then determined by the expected utility of the best decision, where is the parameters space, U (d, θ, ξ, s) is the utility function chosen to satisfy certain goal, s is the observed data from a sample space S, f (s|ξ ) is a suitable marginal density for the response, and d is a decision on the model, for example, the estimator of the parameters. Thus, the decision consists of two parts: first, selection of ξ , and then the choice of an optimal estimator of the parameters, d. Chaloner and Verdinelli (1995, sec. 4.2) stated that in the case of nonlinear models the expected utility needs an approximation. Using Shannon information as utility function, a normal approximation for the estimator, and taking the predictive distribution of the estimator to be the prior distribution lead to the following criterion described by Chaloner and Verdinelli (1995), For a sequential design approach it is necessary to choose an initial design, for example, a locally optimal design for some nominal values of the parameters, perform the experiments, and estimate the value of the vector of unknown parameters using maximum likelihood estimation. Then a new optimal point is chosen using these estimators as nominal values of the parameters and considering the previous design including the new point. Chaudhury and Mykland (1993) provided conditions, first for guaranteeing that the inverse of the information matrix converges to the covariance matrix, then for proving the convergence of the probability distribution of the estimators to the Normal distribution.

D-OPTIMAL DESIGN
In this section, D-optimal designs are computed for model (1). The D-optimal design, ξ * D , minimizes the criterion function D (ξ ). The information matrix at point φ is For simplicity, the transformation x = φ C − φ is used and then the information matrix at point x is It is easy to check that if the support of the D-optimal design equals the number of parameters of the model, which is frequently the case, then the determinant is the product of the weights of the design and a function depending solely on the support points (see Atkinson et al. 2007, No. 8 of sec. 11.1;Pázman 1986, proof of proposition VI.4.). Then the product of the weights is minimized for equal weights no matter the shape of the model considered. The problem reduces to identification of those support points.
Thus, we first consider three-point equal weights candidates, and the determinant, It is fortunate that although (1) is a nonlinear model with respect to γ , the D-optimal design does not depend on it. From here, the following result is derived.
Theorem 3.1. Given the model defined by (1) and the design space, X φ = [φ L , φ U ], the three-point equally weighted design maximizing the determinant is Moreover, if ψ(φ, ξ * D , θ) ≥ 0 for all φ, then ξ * D is actually a D-optimal design.
Proof. See Appendix A in the online supplementary materials.
Example. Janda et al. (2008) Figure 2 shows the sensitivity function ψ(φ, ξ * D , θ) proving this is actually the D-optimal design for this example. Janda et al. (2008) considered a design with eight nonreplicated points. Thus, we put the same weights for all of them as approximate design, The D-efficiency of this design is Therefore, with the D-optimal design, the number of experiments can be reduced by 66.43% to achieve the same D-efficiency as the design ξ 8 .
Remark 3.1. According to the Elfving's graphic method proposed by López-Fidalgo and Rodríguez Díaz (2004), the D soptimal design computed for φ C for the example given by Janda et al. (2008) is The D s -efficiency of (6) is 73.6%, which is reasonably high taking into account the relevant role of φ C . It is important to emphasize the importance of the parameter φ C by its physical meaning, but it is also just as important to have a good fit of the model to make predictions.

COMPOUND CRITERION
The purpose of this article is to compute the best values of φ to run the experiments for estimating the vector of unknown parameters, θ , for model (1). But, an experiment is usually conditioned in practice by economical, temporary factors or lack of resources. In the example considered here the waiting time between two successive jamming events cannot be too long. As a matter of fact it would be natural to define some cost function increasing with this waiting time, that is, with the diameter of the outlet. A criterion combining D-optimality and a gain function is defined by where 0 ≤ λ ≤ 1 and G(φ) is the gain (opposite to the cost) function associated to φ. The transformation x = φ C − φ is used again. We set the following compound criterion function to be maximized, where a gain (instead of loss) function of experimentation for a fixed φ i is considered here, This provides a compromise scale for combining D-optimality and the cost of experimentation. Table 1 shows the optimal designs for three different values of λ considering again the example from Janda et al. (2008). For λ = 0, we get the D-optimal design computed in Section 3. For λ = 1, the optimal design for the cost function is supported at the minimum possible value of φ, where the expected waiting time between consecutive jams is minimum.
The sensitivity function for the compound criterion is We have checked numerically that the GET is satisfied for 0 ≤ λ ≤ 1. The solid and dashed lines of Figure 3 show the efficiencies of the compound designs depending on the value  of λ with respect to the D-optimal design and the design minimizing the cost function, respectively. Equal efficiencies are reached for λ = 0.745. The efficiencies of the designs for any value of λ with respect to the cost function are always greater than 90%. But, the D-efficiency decreases dramatically for values of λ > 0.745.

SENSITIVITY ANALYSIS AND ESTIMATION STRATEGIES
In this section, we focus on two issues. First, we check how sensitive is the optimal design to the choice of the nominal value of φ C . For that, the efficiency of the optimal design with the nominal value φ (0) C = 8.5 is computed assuming different values, say φ * C , around it. Second, a comparative study of the least-squares estimation and maximum likelihood estimation of the vector of unknown parameters θ is performed.

Sensitivity Analysis for φ C
One of the main goals of this problem is to estimate a critical diameter for the outlet above which jamming is not possible (in practical terms). As has been explained in Section 3, the computation of the D-optimal design depends on the nominal value of φ C . The experimenters have estimated the value of the unknown parameterφ C = 8.5 using least squares. This value has been used as a nominal value for computing the optimal design throughout the article.
The sensitivity analysis is performed with respect to the Doptimal design, (6). We consider different values for the critical diameter in a neighborhood of the nominal value, does not belong to the design space X φ . The relative D-efficiency of the D-optimal design obtained with the nominal value of the parameter is defined as where M ξ φ C , φ * C is the FIM assuming the value of φ * C for φ C . A sensitivity analysis may be performed in a similar way for the compound criterion defined in Section 4. In this case, the sensitivity analysis is performed with respect to the optimal design computed for λ = 0.745. Figure 4 shows the behavior of the efficiency with respect to the value of φ * C for both cases. The range for the possible true values of the parameter was selected as follows. The estimator of φ C follows asymptotically a Normal distribution with mean 8.5 and standard deviation 4.29 (estimated from the FIM). In general, possible values for φ * C may be generated from this distribution and the relative efficiency computed if it is computationally too demanding. But in this case we were able to compute a number of efficiencies to complete the plots of Figure 4. We used the distribution to select the range of possible true values not farther than three standard deviations from the mean, that is, 8.5 + 3 × 4.29 = 21.37. On the other hand, it must be above the upper limit of the range for φ * C , that is, 5.63. This gives reasonable limits, (5.63, 21.37), for checking the sensitivity. Thus, Figure 4 (left) shows that for possible true values below 8.5, the D-efficiency of the D-optimal design decays quite fast while for possible true values greater than 8.5, the D-efficiency of the Doptimal design is over 94%. The efficiency of the optimal design for the compound criterion (Figure 4, right) decays slightly for possible true values below 8.5 but faster than for values greater  than 8.5. Therefore, it is better to underestimate the nominal value of the parameter φ C leaving less chances to the left for the possible true values. As shown in Figure 4, the behavior of the sensitivity analysis is very similar for different values for φ (0) C .

Estimation Strategies
In previous sections, optimal designs have been computed to run experiments from different perspectives. These designs are always based on the maximum likelihood estimation from the exponential distribution of the observations. In practice the experimenters have used least squares to estimate the unknown parameters (e.g., Janda et al. 2008). Both estimators of the unknown parameters are computed and compared from simulations with the experimental designs considered in previous sections, and the results are shown in Appendix B of the online supplementary materials. Table 2 shows that, as expected, the mean squared error of the estimators of the unknown parameters for the D-optimal design with both estimation methods are much lower than for the design ξ 8 . It is remarkable that the LSE is better than the MLE for estimating the parameter A with both designs. The large value of parameter A may cause this counterintuitive situation. Nevertheless, this parameter is less important from a practical point of view. Table 3 shows that the bias of the estimators is lower using maximum likelihood than least squares for the D-optimal design for estimating γ and φ C . In any case, the maximum likelihood estimation is more appropriate because the data come from an exponential distribution while least squares estimation is more appropriate for Gaussian data.
The variances of the MLEs of γ and φ C are lower than the variances of the LSEs for both designs. For the D-optimal design, they are always lower than those for the design ξ 8 . However, as happens with mean squared error, the variance of the LSE of A is lower than the variance of the MLE. The determinant of Table 3. Bias of the estimators of unknown parameters using LSE and MLE for the D-optimal design and the design ξ 8

ALTERNATIVE APPROACHES
So far, locally optimal designs have been computed for the nominal value of the vector of unknown parameters. In this section, we compute Bayesian D-optimal designs following the methodology developed by Chaloner and Verdinelli (1995) considering prior information about the vector of unknown parameters θ . This information is represented by a prior distribution, π (θ ). In addition, we compute sequential D-optimal designs using the methodology developed by Chaudhury and Mykland (1993).

Bayesian Designs
Using knowledge of the researchers in the experimental field, it is assumed that φ C is always a positive real number greater than the upper extreme of the design space, X φ . In particular, two continuous prior distributions are considered for φ C . A Lognormal distribution shifted to the right starting at the upper limit of the design space and a truncated Normal distribution to the left of the upper limit of the design space are used.
Consider a Log-normal distribution with parameters μ and σ . Then the shifted distribution in a length of size φ U has the following mean and variance, Let μ LN = 8.5 and σ LN be varying on (0,5]. According to the Bayesian methodology for nonlinear models (Chaloner and Verdinelli 1995), the design is the Bayesian D-optimal design in the sense of (3) if it maximizes where π (φ C ) is the probability density function for the Lognormal distribution shifted to the right and the sensitivity function for the GET is Now, consider a truncated Normal distribution with parameters μ and σ with support [φ U , ∞) for the value of φ C . Then the mean and variance of this distribution are where (·) is the cumulative distribution function and π (·) is the probability density function of the Normal distribution N (μ, σ ). The mean is again fixed to μ T = 8.5 and the standard deviation σ T is allowed to vary on (0, 5]. In this case, the design (10) is the Bayesian D-optimal design if it maximizes and the sensitivity function for the GET is The GET is satisfied for all σ LN , σ T ∈ (0, 5]. Figures 5 and  6 show the middle optimal point, φ * B , and the efficiency with respect to the locally D-optimal design (6). While the value of σ LN and σ T increases, the D-efficiency decreases, but it is always over 96%, and increases the value of φ * B . The D-efficiencies for design (7) with respect to the Bayesian D-optimal designs are always lower than 0.342 for all σ LN ∈ (0, 5] and lower than 0.353 for all σ T ∈ (0, 5].

Sequential Designs
For the computation of a sequential design it is necessary to start by choosing an initial design, then perform the experiments, and estimate the value of the vector of unknown parameters using maximum likelihood estimation. Considering the initial design (6) with the data from Janda et al. (2008),θ = (13.7, 1 × 10 10 , 9.5). Thus, the new point, φ 4 , to be added to the design maximizes and corresponds in this case to the upper extreme of X φ . The D-efficiency of the new design with respect to the D-optimal design, (6), is 94.5%. Let consider now the initial design ξ 8 , thenθ = (11.79, 1 × 10 10 , 8.1). The new point, φ 9 , maximizes and corresponds to the lower extreme of X φ . The D-efficiency of the new design with respect to the D-optimal design, (6), is 34.3%. The conditions stated by Chaudhury and Mykland (1993) for convergence are satisfied here.

DISCUSSION
In this work, a practical physical problem has been considered. The experimenters are interested in the law that governs the process of jam formation during the discharge by gravity of granular material stored in a two-dimensional silo. The experimenters assume a critical outlet diameter such that jamming is known to be impossible above that critical point.
Prior to our work, the process of choosing diameters to run the experiments has been carried out without any rigorous rule. In this article, we propose optimal experimental designs using the D-optimality criterion and a compound criterion with a cost function. Those designs depend on the values of the parameters. Locally optimal designs have been computed for nominal values of the parameters. In addition, prior knowledge of the possible values of the parameter φ C has been considered to search for optimal experimental designs under a Bayesian approach. A sequential design approach to deal with the problem of dependence on parameters has also been proposed and studied. Zhu, Dasgupta, and Huang (2014) proposed a Bayesian sequential algorithm for obtaining D-optimal designs and a geometric approach to compute locally D-optimal designs. Parameter φ C is of special interest for further application since the practitioners are concerned with choosing the right diameter so as to guarantee there will not be a jam within a certain period of time. This is especially important when there is a high cost in breaking a jam, as is the case for the mining industry. After a sensitivity analysis, we can conclude that from the point of view of the experimental design it is better to underestimate the value of the true parameter φ C .
Another important issue is the correct estimation of the unknown parameters of the model. The experimenters have previously used least-squares estimation although it is known that the data come from an exponential distribution. In this work, we propose the use of maximum likelihood estimation for estimating the unknown parameters. The optimal designs are given for this setup. Simulations show that the variance of the MLE is lower than the variance of the LSE.
Another interesting point to note is that the response may be measured in two different ways. On one hand, the time elapsed between two successive jamming events. This fits the assumed exponential distribution used in the article. An alternative way to measure the response is to count the number of beads that fall between two successive jamming events, which can be approximated by the total weight of material falling between two successive jamming events. Even more than that, just the weight of them is taken as a measurement of this "between two successive jamming events." Thus, a Poisson distribution may be used instead. At this point a detailed study of the statistical technique of D-optimal experimental design applied to a Poisson model describing the elimination of a radioactive substance in the human body can be found in López-Fidalgo and Villarroel (2007).
The process of jam formation in a two-dimensional silo has been studied widely and several models have been proposed in practice. A comparative study between these models and our model will be considered in future research.
Censoring the time between two successive jamming events may be considered if a particular diameter takes too long to produce a jam. Some limiting time can be fixed to shorten waiting times, and so costs, in the experimentation. This should be done both for estimating the parameters and for finding optimal designs.

SUPPLEMENTARY MATERIALS
Proof: A detailed proof of Theorem 1. (theorem1proof, pdf file) Simulations: Results of the simulations of Section 5.2. (simulation, pdf file)