Effective Design-Based Model Selection for Definitive Screening Designs

ABSTRACT Since their introduction by Jones and Nachtsheim in 2011, definitive screening designs (DSDs) have seen application in fields as diverse as bio-manufacturing, green energy production, and laser etching. One barrier to their routine adoption for screening is due to the difficulties practitioners experience in model selection when both main effects and second-order effects are active. Jones and Nachtsheim showed that for six or more factors, DSDs project to designs in any three factors that can fit a full quadratic model. In addition, they showed that DSDs have high power for detecting all the main effects as well as one two-factor interaction or one quadratic effect as long as the true effects are much larger than the error standard deviation. However, simulation studies of model selection strategies applied to DSDs can disappoint by failing to identify the correct set of active second-order effects when there are more than a few such effects. Standard model selection strategies such as stepwise regression, all-subsets regression, and the Dantzig selector are general tools that do not make use of any structural information about the design. It seems reasonable that a modeling approach that makes use of the known structure of a designed experiment could perform better than more general purpose strategies. This article shows how to take advantage of the special structure of the DSD to obtain the most clear-cut analytical results possible.


Introduction
Definitive screening designs (DSDs) have experienced widespread application since their introduction by Jones and Nachtsheim (2011). Areas of application include medical devices, laser etching, touch screen production, biomanufacturing, and green energy production. Jones andNachtsheim (2011, 2015) had advocated the use of stepwise model selection techniques to identify active main effects, two-factor interactions (2FIs), and quadratic effects. However, recent work (Errore et al. in press a,b) has shown that these methods can break down when there are more than just a few active second-order terms. Our purpose here is to introduce a new model selection technique that is tailored to the structure of a DSD. We will show that leveraging the orthogonality of main effects and the orthogonality between main effects and second-order effects can lead to a much more powerful and effective model-selection technique.
As introduced by Jones and Nachtsheim (2011), a minimumrun m-factor DSD has 2m + 1 runs. These designs were shown to be orthogonal for four, six, eight, and ten factors. Xiao, Lin, and Bai (2012) showed that, for many even values of m, an mfactor minimum-run DSD, D m , can be constructed using where C m is an m × m conference matrix and 0 is a row of zeros. An m × m conference matrix is an orthogonal matrix that satisfies C m C m = (m − 1)I m . Using conference matrices, Jones and Nachtsheim (2013) showed how two-level categorical factors can be incorporated into a DSD. When the number of factors is odd, practitioners can use the method of Xiao et al. by adding one fake factor, creating the DSD with two extra runs, and then dropping a column. Some investigators also add extra center runs to get a pure error estimate of the error variance. Even when the number of factors is even, we will show that it is advantageous to add m f = 2 fake factors, at a cost of 2m f = 4 additional runs. For much of what follows, we will assume that the DSD employed has more runs than the minimum number of runs, either from the use of fake factors or from center-run replication.
A DSD is a supersaturated design with respect to the full quadratic model. There is a substantial literature dealing with the problem of model selection for supersaturated designs. Marley and Woods (2010) performed a simulation study investigating the performance of forward selection, the Gauss-Dantzig selector, and model averaging for identifying the correct set of effects under many scenarios. They found that the Gauss-Dantzig performed the best of the three, and they recommend that the number of factors should not be more than twice the number of runs. The six factor DSD has 13 runs and 27 effects of potential interest in the full quadratic model. So, the DSD falls a little short for this requirement. However, Marley and Woods also recommend that number of runs should be three times the number of active factors. This would suggest that the six factor DSD might reliably identify only four or five effects. Jones (2016) performed simulation studies evaluating Lenth's (1989) method, forward selection, and the lasso for model selection applied to unreplicated orthogonal designs. He found that when the number of active effects exceeds half the number of runs, these methods can fail to identify many active effects. Since main effects in a DSD are orthogonal and there are fewer than half as many main effects as runs, we can expect that a DSD will reliably identify all the active main effects as long as no secondorder effects are active. This is borne out in the power studies in Jones and Nachtsheim (2011). However, any unidentified active second-order effect will result in a positively biased estimate of the error variance, which can result in a failure to identify active main effects. Jones and Nachtsheim (2011) also provided power studies that indicate that DSDs can reliably identify one active second-order effect in addition to all the main effects as long as this effect is at least twice as large as the error standard deviation for 2FIs and three times as large as the error standard deviation for pure quadratic effects. As the number of active secondorder effects grows, the ability of the DSD to reliably detect these effects must drop regardless of the model selection technique employed.
The model selection technique we will introduce is motivated by the work of Miller and Sitter (2005). They advocated using foldover designs in screening because these designs allow for separating the analysis of main effects (odd functions) from the analysis of 2FIs (even functions). A function g is an odd function if g(−x) = −g(x) for all x, and is an even function if g(−x) = g(x) for all x. (To see that two-factor crossproduct terms are even functions, take g(− ; for quadratic terms, take x i = x j in the preceding example.) They point out that the sample space for the response can be separated into the sample space spanned by the main effects and its orthogonal complement. They refer to the space spanned by the main effects as the "odd space" because it contains all the information about the main effects, threefactor interactions, and higher-order odd functions. The orthogonal complement space contains all the information about the intercept, the second-order effects, and higher-order even functions. DSDs provide advantages over the foldover designs of Miller and Sitter for two reasons. First, the foldover designs of Miller and Sitter are generally not orthogonal for the main effects. This makes it more difficult to identify the active main effects. Second, they only considered screening designs with 2m runs. Unlike the designs of Miller and Sitter, DSDs are orthogonal for the main effects. Moreover by adding fake factors, they can be augmented beyond the minimal number of runs (2m + 1) as originally introduced by Jones and Nachtsheim (2011).
Assume we have m factors and that m f fake factors have been used in the construction of the DSD. In this case, the number of rows and columns of the conference matrix used to construct the DSD is c = m + m f . As usual, we assume that third-order and higher-order effects are zero, which leads to the full secondorder model: where the parameters β 0 , . . . , β mm are unknown constants (of which many are zero by the sparsity of effects assumption), and the {ε i } are iid N(0, σ 2 ). The number of second-order effects (2FIs and pure quadratic effects) is m 2 = m(m + 1)/2. Let D denote the first m columns of the DSD, and let F denote the m f fake factor columns. Let X 2 denote the centered n × m 2 columns corresponding to second-order terms. The model may be written in matrix form as where Y is the n × 1 response vector, μ is a constant, 1 is an , and the residual vector ε is n × 1. We assume that the residual vector is normally distributed with mean vector 0 and variance-covariance matrix σ 2 I n×n . We write the model matrix as M = [1, D, F, X 2 ], and finally, we take β F = 0, since the columns of F correspond to fake factors. For an n × p matrix Z, let P Z = Z(Z Z) − Z denote the orthogonal projection operator for the column space of Z. With this setup, it is straightforward to show that MSE F = Y P F Y/m f is an unbiased estimator of σ 2 , again under the assumption that third-and higher-order effects are negligible. This follows because we can partition the total sum of squares as where the projection operators P 1 , P D , P F , and P X 2 sum to I n×n and are mutually orthogonal. Notice also that the ranks of the projection operators are 1, m, m f , and c, respectively. Thus, the rank of M is n = 2c + 1. Since β F = 0, it follows from Cochran's theorem (Christensen 2011 Moreover, is an unbiased estimator of σ 2 based on m f degrees of freedom. Let n c denote the number of center points in the DSD. If n c > 1, a pure error estimate of the variance, s 2 c , can be pooled with s 2 F to obtain a combined estimate If n c ≤ 1, take s 2 e = s 2 F , otherwise take s 2 e = s 2 p and let ν e denote the degrees of freedom associated with s 2 e . The strategy for model selection to be described next will leverage the presence of this unbiased estimate of the error variance. We find that the simulation results using this variance estimate are impressive enough to suggest the routine use of DSDs having fake factors over the minimal-run DSDs originally proposed. An outline of the article is as follows. In Section 2, we examine the structure of DSDs in an effort to characterize the limitations of any model-selection procedure when applied to the results of a DSD experiment, as a function of the number of active second-order terms. In Section 3, we describe our procedure for DSD model selection. In Section 4, we conduct a simulation study evaluating the efficacy of the new method for 6, 8, and 10 factors where the number of runs is either 2m + 5 or 2m + 1, and we compare the performance of the proposed method to forward stepwise selection based on the AICc criterion, to the elastic net (Zou and Hastie 2005), and to hierNet (Bien, Taylor, and Tibshirani 2013), a generalization of the elastic net that can enforce the assumption of heredity. Section 5 provides an example of an actual 3 3 full factorial experiment that we ran to model the computational burden of our suggested analytical method as the number of active factors and second-order effects increases. We then chose a subset of the original results that matched the first three columns of a six-factor 17-run DSD, and demonstrated the effectiveness of our suggested analytical approach by correctly identifying all of the active effects in the full quadratic model found in the analysis of the original full factorial design. In Section 6, we summarize our results and make recommendations.

Model Selection With DSDS: What Are the Limitations?
As noted above, in any conference-matrix-based DSD, all linear and quadratic main effects are orthogonal, 2FIs are orthogonal to main effects, and 2FIs are never fully confounded with each other. This structure provides some basis for optimism that a true model consisting of main effects, a few 2FIs, and a few quadratic effects, can be identified using a modern modelselection technique such as stepwise or the lasso. Clearly, however, as the number of active second-order terms increases (so that the level of sparsity decreases), the performance of any model selection method will degrade. Of course, if the number of terms in the model exceeds n, the design matrix will be singular and β will not be estimable. Let n so denote the number of second-order terms in the model. In this section, we characterize approximately the limitations of model selection procedures to identify true second-order models as a function of n so .
We consider two characteristics of the design: (1) the percentage of models that are not estimable, p sing , and (2) the percentage of model pairs that are confounded, p conf , as defined below. The former characteristic is a measure of model robustness since it is the same as 1 − EC, where EC stands for estimation capacity (Sun 1993). Now consider model discrimination. For a given number of factors and a given DSD for that number of factors, suppose that the true model, evaluated at factor settings given by x, is given by Consider any other second-order model, f F (x) = f T (x), and let H T and H F denote the hat matrices for the models for f T and f F , respectively. Now consider any response vector y. The difference in the predictions given by the two models isŷ Jones et al. (2009), if H T − H F = 0, the two models cannot be discriminated for any response y and are said to be confounded. Jones et al. (2009) proposed the expected prediction difference (EPD i, j ), as a measure of the capability of a design to discriminate between two models, f i and f j . They showed that expected prediction difference is given by Clearly, if EPD i, j = 0, then the two models i and j are confounded.
In Table 1, we show the percentages of singular models, p sing , and the percentages of confounded model pairs, p conf , for six and eight factor DSDs having zero or two fake factors, m f , as a function of the number of second-order terms n so . Consider first the six-factor case when there are no fake factors. We see from Table 1, row 1, that even with just three second-order terms, there is a need for caution. In this case, five (0.3759%) of the Table . Summary of model robustness and discrimination properties for DSDs having  or  factors and zero or two fake factors. m f denotes the number of fake factors; n so denotes the number of second-order terms; n models denotes the number of models having n so second-order terms; p sing denotes the percentage of models that are singular; n model pairs denotes the number of model pairs; and p conf denotes the percentage of confounded model pairs. Whenever the confidence interval lower and upper bounds are equal, the percentages are exact; when they are not equal, the percentages were estimated using random sampling of models or model pairs.

% Limits
% Limits 1330 models are not estimable. Clearly, as the number of secondorder terms increases, so does p sing . For n so = 5, p sing = 4.67% and for n so = 6, p sing = 24.01%. The latter case underscores the difference between model robustness and model discrimination.
Even though about 76% of the models are estimable, p conf = 1, which means that none of the model pairs can be discriminated. (This must be the case since the rank of the even space here is seven, and the number of even-space model terms is also seven, including the intercept.) Clearly, attempting to estimate more than two or three second-order terms with this design can be problematic.
In contrast, when we add four runs to the six-factor DSD using two fake factors, there is less need for caution. When n so = 3 (not shown) or n so = 4 all of the models are estimable, and essentially all of the model pairs can be discriminated. Only when n so = 6 or 7 do we begin to be concerned about the number of unestimable models and/or confounded model pairs. For n so = 7, about 3% of the models cannot be estimated, and about 0.5% of model pairs are confounded.
For eight factor DSDs, the results are analogous. For the minimum-run-size DSD, only in the case of n so = 4, are all models estimable. When the number of second-order effects is six or seven, the percentage of unestimable models becomes a concern. Model confounding does not seem to be a serious issue until we attempt to fit seven second-order terms, in which case p conf is about 1.8%.
We conclude, from an examination of the structure of DSDs for six and eight factors, that attempting to identify models having about c/2 or more second-order terms may be difficult. This observation is confirmed, to some degree, by the simulation study we carry out in Section 4.

Model Selection Procedures
In this section, we describe our model selection procedures for minimum-run-size and nonminimum-run-size DSDs, for models following strong heredity and for unrestricted models.

Case 1: n > 2m + 1
In this section, we describe our model selection procedure in the presence of m f > 0 fake factors and, optionally, n c > 1 center runs. To begin the procedure, assume that we have computed s 2 e as described above, and let y denote the centered response vector. 1. Regress y on the matrix X DF = [D, F] and compute the predicted values. These become the response data for the space of the main effects, denoted y ME : 2. The residuals of the above regression become the response data for the second-order effects, denoted y 2nd : 3. Letβ D denote the first m estimated main-effects coefficients from the regression in Step 1.
4. Calculate the m t-statistics and p-values for hypothesis tests of all the main effects (i.e., each element ofβ D ), using s 2 e as the estimate of σ 2 . Using α = 0.05, record the indices of the active main effects, A ME , and let m ME denote the number of active main effects. For example, if the 1st, 3rd, and 4th main effects are significant, then A ME = {1, 3, 4} and m ME = 3. Pool any insignificant main effect columns with F and obtain a new estimate of σ 2 using Equation (2) after increasing the degrees of freedom (m f ) by m − m ME . 5. In this step, we identify the active second-order effects.
There are two cases to consider, depending on whether or not the investigator is willing to assume that the active model terms exhibit strong heredity.
(a) Assume that the model follows strong heredity. Create the matrix, X q , containing all the 2FIs and quadratic effects involving main effects found in Step 4. To continue the example from Step 4, where three active main effects were found, the resulting matrix will have six columns. Let D i j denote the elementwise product of columns i and j. Then: In general, if A ME has k elements, then X q will have k(k + 1)/2 elements. We use the guided subset selection procedure, described next, to identify active second-order terms. Guided subset selection: Start by computing the total sums of squares, TSS = y 2nd y 2nd and construct the statistic F = (TSS/c)/s 2 e . If F does not exceed the critical value of the F distribution having c and ν e degrees of freedom for α = 0.2, then we infer that there are no active second-order effects and stop. Otherwise, we fit all the one-term models for each second-order effect and choose the effect having the smallest residual sum of squares, R 1 .
If F 1 does not exceed the critical value for α = 0.2, we infer that there is one active second-order effect. Otherwise, we fit all the two term models containing pairs of second-order effects and choose the effect pair having the smallest residual sum of squares, R 2 . Then We again perform an hypothesis test at the α = 0.2 level of significance, and stop if we fail to reject. Otherwise, we continue this procedure up to models having c/2 second-order terms, with one exception. If the number of active main effects is three or fewer, then continue this procedure up to the total number of second-order effects involving the active main effects. (b) Do not assume that the model follows strong heredity. The matrix, X q , containing all of the predictor columns for the 2FIs and pure quadratic effects of the m factors, has m(m + 1)/2 columns. Follow the guided subsets procedure in part (a), up to a maximum of c/2 terms.

Case 2: n = 2m + 1
The main difference from the previous case is that, because we do not have an unbiased estimate of the variance, we use a sequential testing procedure to identify active main effects. We then use the m − m a inactive effects to create an estimate of σ 2 . This process replaces the first four steps above. Steps five and six then follow without modification. For this procedure to be effective, we must assume that the number of active main effects is less than m.
Our sequential testing procedure proceeds as follows.
2. Order the absolute values of the estimated effects from largest to smallest. For 1 ≤ i < m, let R i denote the residual sum of squares for the model containing the ith largest absolute magnitude main effect and all main effects larger in magnitude. Let R 0 denote the adjusted total sum of squares.
and the associated p-value obtained from the F distribution having i and m − i degrees of freedom. Select the model having the smallest p-value. 3. Implement Steps 5 and 6 from Section 2.1 without modification. The procedure just described cannot identify the correct model if all the main effects are active because when all the main effects are in the model the residuals are all zero. The procedure in Step 2 above relies on the assumption that at least one of the main effects is inactive. This sparsity assumption about the main effects is perhaps too strong. An investigator may want to detect all the active main effects even when all the main effects are active. Without fake factors this is impossible using the y ME vector alone. However, we can resolve this difficulty by assuming effect sparsity in the model for y 2nd . We assume that there are no more than c/2 active second-order effects to get an estimate for σ 2 . Using forward stepwise regression we find the c/2 largest second-order effects. We then use the residuals from the model containing these effects for our estimate of σ 2 having c/2 degrees of freedom (s 2 2nd ). We use this estimate to test whether the main effect estimate that is smallest in magnitude, denotedβ min , is, in fact, active. Compute the model sum of squares forβ min , mssβ min , and compute the ratio F = mssβ min /s 2 2nd . Under the assumption that the model found via forward stepwise is correct, this statistic is distributed as F (1, c/2). If the p-value for this test is less than 0.05 then conclude that all the main effects are significant. Otherwise proceed with Step 2 above.
If there are more than c/2 active second-order effects, our estimate, s 2 2nd , will be biased high with the likely result that we will not detect the smallest active main effect. This provides more motivation for the routine use of fake factors in DSDs. Using DSDs with the minimum numbers of runs is justifiable only when the cost of each experimental run is too large to allow for four extra runs.

Simulation Study: Model Selection for 6, 8, and 10-Factor DSDS
In Sections 4.1 and 4.2, we describe the design and analysis of a simulation study to characterize the performance of our model-selection procedure for varying numbers of factors, active main effects, active 2FIs, and active pure quadratic effects. Section 4.3 provides a comparison of the results of our procedure with the standard forward stepwise procedure based on AICc, as advocated by Jones andNachtsheim (2011, 2015), and with the elastic net (Zou and Hastie 2005). Section 4.4 investigates the relative performances of our new method and hierNet (Bien, Taylor, and Tibshirani 2013) under the assumption that the true model exhibits strong heredity.

Design of the Study
We constructed the simulation study as a designed experiment with all possible combinations of the factors below: 1. The number of factors, m, takes on three values, 6, 8, and 10. 2. The type of design. For each factor size, we consider a nonminimum-run-size DSD based on m f = 2 fake factors, so that the design sizes for 6, 8, and 10 factors are 17, 21, and 25, respectively. For minimum-run-size DSDs, the design sizes are 13, 17, and 21. 3. The number of active main effects, m a , varies from three to six. 4. The number of active 2FIs, m 2FI , varies from zero to three. 5. The number of active pure quadratic effects, m q , varies from zero to three. 6. The (minimum) signal-to-noise ratio, SN, of the active effects varies from two to three. (As described below, we generate the coefficient for each active effect by adding SN to an exponentially distributed random number.) 7. Model type. The model type is either unrestricted or follows strong heredity. 8. Replicates. Each design setting is replicated 100 times. The study is thus run as a 4 2 × 3 × 2 3 factorial design with 1536 treatment combinations, each replicated 100 times. Given m a , m 2FI , and m q , true models are generated as follows. For unrestricted models, m a main effects, m 2FI 2FIs, and m q quadratic effects are selected at random from the m potential main effects, m(m − 1)/2 potential 2FIs, and m potential quadratic effects, respectively. For models that follow strong heredity, we first select m a main effects from the m possible main effects. We then form the m a (m a − 1)/2 2FIs and m a quadratic effects that are possible given the set of active main effects. We then choose the m 2FI 2FIs at random from the possible set of 2FIs, and similarly for quadratic effects. For each factor setting we simulated 100 response vectors randomly. If the signal-to-noise ratio of active effects is SN, we generate the coefficient for each active effect by adding SN to an exponentially distributed random number. We chose the sign of each coefficient at random. We then generate an n × 1 standard normal vector and add it to E(y) to obtain the response vector. For each setting, we record the power and Type I error rates for detection of main effects, 2FIs, and pure quadratic effects.

Simulation Results
Since the model is a full factorial design with 100 replicates, we initially performed an analysis of variance (ANOVA) on the results, based on the full factorial model. We found that many high-order interaction effects were statistically significant, which precluded a very simple interpretation of the results in terms of (say) main effects and/or two-factor interaction plots. For this reason, we employ conditional effects plots of the predicted values for the full ANOVA model for interpretation (via profile plots created with the JMP statistical software system) in what follows. In general, the direction of the results are as expected. Power increases with the assumption of strong heredity, with the introduction of fake factors, and with increasing signal-to-noise ratio. Power for main effects is unaffected by the presence of second-order effects as expected, given the design structure and the resulting orthogonality of y ME and y 2nd . Power for second-order effects decreases as the number of active main effects, 2FIs, and quadratic effects increase. The main objective of the study is to identify the numbers of main and second-order effects that our analysis procedure can reliably identify in the presence of sparsity.
To describe the results, we consider each number of factors in turn, starting with six factors. Figure 1(a) contains a JMP profile plot of the power for detecting main effects in the presence of various numbers of second-order effects. In the figure, the factors are set to the values that lead to minimum power (e.g., six active main effects, SN = 2, three 2FIs and three quadratic effects) and that power is approximately 1.0. The horizontal line in the left-most panel shows that the power for identifying main effects does not depend on the assumption of strong heredity. Clearly, the method works well for identifying main effects. Figure 1(b) similarly summarizes the power for detecting 2FIs. The factor settings are the values that lead to the minimum power, subject to the constraint that power is greater than about 0.8. From the plot, it is clear that the method can reliably identify two 2FIs, even if five main effects and two quadratic effects are also active. The results for identifying quadratic effects, as shown in Figure 1(c), are quite similar to those for 2FIs in Figure 1(b), although there is a slight overall decrease in power. As shown, the power for detecting two quadratic effects having SN= 3, in the presence of two interactions and five main effects is 0.74. There is also a trade-off that occurs among the three types of effects. For example, if five main effects are active, but only one quadratic effect is present (rather than two), the number of detectible 2FIs increases from two to three.
The results for the minimum-run-size DSD are shown in Figure 2. For main effects, the difference in power from Figure 1 is most pronounced when five main effects are active and there are three active 2FIs and three active quadratic effects. Here the power drops from near 1.0 to about 0.85. As shown in the figure, with fewer main effects, the power is again near 1.0. The power profile for identifying 2FIs is shown in Figure 2(b). It is clear that the ability to identify 2FIs has dropped, in comparison with the results for the 17-run design. Keeping power above about 0.8, we can reliably identify two 2FIs when four main effects and one quadratic effect are active. This compares with two 2FIs with five main effects and two quadratic effects for the 17-run design, as shown in Figure 1(b). The drop in our ability to detect quadratic effects is similar to the drop for 2FIs. With two fake factors, we could reliably identify two quadratic effects in the presence of five active main effects and two active 2FIs. For the minimumrun-size design, we can still identify two quadratic effects (with  power near 0.8) but in the presence of four active main effects and just one active two-factor interaction.
Graphical summaries of the simulation results for eight factors and two fake factors are provided in Figure S1 of the supplementary materials. We found that power is still near 1.0 for identification of up to six main effects and our ability to identify active second-order effects improved compared to the sixfactor case. Keeping power above about 0.8, we can identify three 2FIs in the presence of six active main effects and two quadratic effects. Similarly, three quadratic effects can be reliably identified in the presence of five active main effects and two active 2FIs.
Finally, graphical summaries of the simulation results for 10 factors and two fake factors are shown in Figure S2 of the supplementary materials. The power for identifying up to six active main effects is still essentially 1.0 in the presence of three active 2FIs and three active quadratic effects. We note that seven through nine main effects can still be identified with very high power, however the simulation times required for those cases precluded their inclusion. We found that the power is 0.9 for identifying three 2FIs in the presence of six active main effects and three active quadratic effects; it is roughly 0.8 for identifying three quadratic effects in the presence of six active main effects and three active 2FIs.
Another advantage to employing fake factors is that the Type I error rate for main effects for these cases is exactly the nominal level (0.05). A 95% confidence interval on the average Type I error rate was (0.048, 0.052) across all three designs having fake factors and all combinations of numbers of active main effects and active second-order effects. We excluded the cases with six active factors for the design with six factors because a Type I error rate in these cases cannot be determined since all the main effects are active. For designs without fake factors, the sequential procedure for choosing the best set of main effects resulted in an average Type I error rate of 0.023 with a 95% confidence interval of (0.0225, 0.0247). For the second-order effects our procedure yielded an average Type I error rate of 0.022 for 2FIs and an average Type I error rate of 0.033 for pure quadratic effects. Type I error rates for 2FIs increased to a worst-case average of 0.048 for six factor designs when model heredity was not assumed. Type I error rates for pure quadratic effects increased to a worst-case average of 0.055 for six factor designs when model heredity was not assumed.

Comparisons With Stepwise/AICc and the Elastic Net
We noted above that Jones andNachtsheim (2011, 2015) have advocated the use of stepwise methods, based on the AICc criterion (Stepwise/AICc), for model selection for DSDs. This is a standard approach that is widely available in software packages. In this subsection, we compare the use the tailored DSD analysis advocated here to Stepwise/AICc (SWA) and to the Elastic Net (ENet) as implemented in the JMP statistical software system.
Our comparison is based on a simulation study, identical to that described in the previous subsection, with the exception that we limit the study to six factors, four augmented runs, SN = 3, and, for each design setting, we conduct the analysis using both our new analysis method (New), SWA, and ENet. When heredity is assumed, we conduct SWA and ENet using two stages. In the first stage, we use the methods to select the active main effects. In the second stage, we identify active second-order effects among the set of all 2FIs and quadratic effects that can be constructed from the set of active main effects, while forcing Figure . Boxplots of the power for identifying main effects versus the analysis method. "New, " "ENet, " and "SWA" denote the new method, the Elastic Net, and Stepwise/AICc. the active main effects to remain in the model. When heredity is not required, we use a standard one-stage application of SWA or ENet. We employed 1000 replications of each design setting.
The ANOVA results indicated that fifth-and all lower-order interactions were significant. For this reason, a simple interpretation of results in terms of main effects and/or 2FIs is not possible. Instead we use conditional effects plots via the JMP prediction profiler to summarize key outcomes.
Before summarizing the full analysis we point to a major advantage of the new method for identifying active main effects. Recall that the new method takes advantage of the design structure by separating the response vector into two orthogonal vectors, one for main effects and one for second-order terms and the intercept. When the new method searches for main effects, the error variance is not inflated by the any active second-order effects. Therefore, we would expect our method to outperform any other method that does not leverage the design structure for the identification of main effects. This expectation is confirmed in fairly dramatic fashion by the boxplots in Figure 3. This is a simple overall summary of the power for identifying main effects for the three methods.
Results for the tailored DSD analysis with SN = 3, two fake factors, and models following strong heredity are shown in Figure 4, where the first row corresponds to the power for identifying active main effects, the second to the power for 2FIs, and the third row corresponds to the power for identifying quadratic effects. Notice that, for the profiler settings specified (Active MEs = 4, Active 2FIs = 2, and Active Quadratic = 2), the predicted powers for main effects, 2FIs, and quadratic effects are 0.9995, 0.9140, and 0.8135, respectively. The negative slopes in the right-most "Method" panels suggest that power will drop precipitously if the method is changed to either stepwise/AICc or the elastic net, and this is indeed the case. If we change to the elastic net model-selection method while keeping all of the other design settings the same, the powers for main effects, 2FIs, and quadratic effects drop to 0.5613, 0.4100, and 0.3550, respectively. For Stepwise/AICc these numbers are 0.5788, 0.3350, and 0.3050. In general, the power values for stepwise/AICc and the elastic net are substantively reduced, especially when heredity is assumed. The three methods are only somewhat comparable for unrestricted models when there are just two or fewer interactions present and fake factors have been included. We conclude that the proposed method is superior to standard model-selection procedures, especially for models following strong heredity.

Comparisons to Model Selection Methods That Impose Strong Heredity
In the introduction, we noted that there are alternative variable selection methods that can impose the strong heredity constraint. Chipman (2006) developed a Bayesian variable selection method that can impose strong heredity. SHIM (Choi, Li, and Zhu 2010;Li and Zhu 2014) and the hierNet method (Bien, Taylor, and Tibshirani 2013) are generalizations of the lasso and the elastic net, respectively, that allow the identification of active 2FIs in models that follow strong heredity. The hierNet method is of particular interest here because it also permits the identification of active pure quadratic effects. We also note that Errore et al. (in press a,b) employed a modification of SHIM to include pure quadratics and found that the method had good power but high Type I error rates when used to analyze DSDs. For this reason, we limit comparisons to hierNet and the model selection approach advocated here. To compare hierNet and the new method, we repeated a subset of the simulation study of Section 4.1. We limited the study to eight factor DSDs including two fake factors so that there were  21 runs, while setting the SNR to 3. We allowed the number of active main effects to vary between 3 and 6, and varied the number of active 2FIs and active pure quadratic effects between 1 and 3. The R package, hierNet, (Bien and Tibshirani 2015) was used to carry out the hierNet computations. We note that the hierNet method requires that the user specify one tuning parameter, λ. In our initial simulations, we employed the value (λ = 50) that is used in the example in the hierNet package. For this value of the tuning parameter, the power for detecting second-order effects was negligible. For this reason, we experimented with alternative values, and found that setting the tuning parameter to λ ≈ 10 resulted in reasonable power for detecting 2FIs. Smaller values of the tuning parameter led to greater power for identifying second-order effects (especially pure quadratic effects), but the associated Type I error rates skyrocketed. For example, for the value λ = 1 the Type I error rate for main effects was 0.96. Obviously, at this Type I error rate, very little factor screening actually happens. For the chosen value of the tuning parameter, the methods performed comparably in terms of the power for identifying active main effects. However, when it came to the identification of active second-order effects, we observed substantial differences. Consider first the identification of active 2FIs. Figure 5(a) shows boxplots of the observed power for identifying one, two, and three 2FIs. We observe that the power for identifying one 2FI is comparable for the two methods, however, the Type I error rate for hierNet is more than double that for our method. Also, as the number of 2FIs increase, the power for hier-Net degrades substantially, while the new method stays roughly the same. Moreover, as the number of active 2FIs increase, the Type I error rate of the hierNet method continues to rise, while that for the new method decreases. Figure 5(b) shows the associated Type I error rates. We observed much larger differences in power for pure quadratic effects. For the hierNet method, the observed power never exceeded 0.11, whereas for our method, observed power was never less than 0.79.

Example: Timing Study
In this section, we illustrate the use of our method, where the response of interest is the execution time that our method expends in analyzing the various models in our simulation study.
One purpose of this timing study is to model the computational burden of the new method for analyzing data from a DSD having six factors and 17 runs as a function of the number of active main effects and active second-order effects. Second, and more importantly, we use this study to demonstrate the utility of our new analytical approach in a real experiment where the correct set of active effects in a full quadratic model is known.
We instrumented our simulation code to record the number of seconds required to generate 1000 simulated sets of responses and analyze them using our new method. Here, we present the results of this timing study as a 3 3 full factorial design. The three factors are as follows: 1. Number of active main effects with levels 4, 5, and 6, labeled A and coded −1, 0, and 1, respectively 2. Number of active 2FIs with levels 1, 2, and 3, labeled B and coded −1, 0, and 1, respectively 3. Number of active quadratic effects with levels 1, 2, and 3, labeled C and coded −1, 0, and 1, respectively. Table S1 of the supplementary materials shows the design and the associated time for each of the 27 simulation runs. The length of the simulation increases with increasing values of each of the three factors. We fit a full quadratic model to the full factorial design data and removed the two insignificant terms B 2 and C 2 . The equation for the fitted model was: t = 11.9 + 14.2A + 5.7B + 4.3C + 6.2AB +4.6AC − 2.5BC + 6.4A 2 , wheret is the predicted value of the simulation time. The model predicts 2.4 sec for the low settings of each factor and 50.8 sec for the high settings of each factor. The guided subset selection procedure takes most of the simulation time but when the number of active second-order effects is small, the procedure ends early.
A second motivation for the timing study was to illustrate our analytical procedure with a real example. Table 2 is a DSD constructed using eight factors but treating two of these factors as fake factors. The values in columns A, B, and C are a 17-run subset of the 27-run full factorial design in Table S1 of the supplementary materials. The columns D, E, and F have no true effect. Table . Subset of timing study data corresponding to the six-factor, -run DSD. Data from columns A, B, C, and Time match that from Table S in the supplementary materials. Factors D, E, and F are spurious, but assumed to be real factors for purposes of analysis; two fake factors were used to construct the design.

Run
Time Applying our analytical procedure under the model heredity assumption to the data in Table 2, we obtain the following active effects: A, B, C, E, AB, AC, BC, and A 2 . Effect E is a Type I error, but all the other effects match the effects found from the analysis of the full factorial design. Figure S3 of the supplemental materials shows the results of each of the phases of the model selection procedure. The resulting prediction equation iŝ t = 13.1 + 14.8A + 6.6B + 3.2C + 2.4E + 6.2AB +3.2AC − 2.8BC + 5.3A 2 .
The coefficient estimates match the estimates obtained from the full factorial design within experimental error. This example demonstrates the promise of DSDs combined with our analytical approach to screen effectively while simultaneously identifying several active second-order effects.

Discussion
In this article, we have provided an approach for the analysis of DSDs that makes effective use of their special structure. The power of this approach to detect active main effects as well as active second-order effects far exceeds the capabilities of generic model selection tools, which do not take the structure of the data into account.
For a DSD constructed using conference matrices having c rows and columns, we find that investigators willing to assume model heredity can reliably identify up to about c/2 active second-order effects when model heredity actually holds. If there are three or fewer active main effects, then our method can reliably identify all of the active second-order effects.
Employing fake factors in the design of a DSD dramatically improves its ability to identify active second-order effects beyond the effect of merely adding four runs. As a result we strongly recommend the routine use of DSDs employing fake factors over fielding minimal-run DSDs. Relatedly, Wu, Boos, and Stefanski (2007) advocated adding random predictor columns to a regression to help determine if a model selection procedure is over-fitting when the number of predictors exceeds the number of observations. Of course, even a random factor could, by chance, have its estimated coefficient aliased by other variables with which it is correlated. Our fake factors are uncorrelated with any of the effects of interest and so avoid this problem. We note that one could alternatively obtain two degrees of freedom for pure error without employing fake factors by adding three center runs. If a choice can be made, we strongly prefer the addition of four runs via two fake factors. This not only provides two degrees of freedom for pure error, it measurably improves our ability to identify active second-order effects.
If more than three main effects are active and more than about c/2 second-order effects are active, then accurate model selection deteriorates regardless of the approach. In such cases, practitioners may need to augment the initial design with enough runs to fit a full quadratic model in the factors having active main effects. Another valuable benefit of employing fake factors is that the identification of active main effects is reliable even if there are too many active second-order effects to correctly identify. Research on the best way to augment DSDs as a function of the observed number of active main effects is currently underway.
The timing study showed that as the number of active main effects grows, the time required to do the guided subsets analysis grows rapidly. In larger screening experiments employing 12 or more factors, the analytical cost of performing the subsets analysis on the second-order effects may be prohibitively expensive. An alternative would be to use a forward stepwise procedure for selecting these effects. Preliminary simulation studies have shown that using forward stepwise does not significantly reduce the average power for detection of active main effects, 2FIs, or quadratic effects.
The assumption that models exhibit strong heredity is a bit controversial. Most practitioners have some experience with examples where they have observed that neither main effect is active while their two-factor interaction is active. The simulation studies in this work either assumed models that exhibit heredity or unrestricted models. When heredity was assumed, the simulated model exhibited heredity. When the unrestricted model was assumed, the simulated model was not constrained in any way. There are two other possibilities that involve errors in the assumptions. First, heredity might be assumed while the model generating the actual data is not constrained. Second, the unconstrained model might be assumed while the simulated model exhibits heredity. In the first case, which erroneously assumes model heredity, active 2FIs and pure quadratic effects that are not associated with an active main effect will become Type II errors. This is because the analysis that assumes heredity precludes consideration of such terms. The second case wastefully takes into consideration all the second-order effects when the system actually exhibits model heredity. In this case, Type I errors can occur by chance when a second-order effect enters the model without its accompanying main effects. This can happen because in DSDs second-order effects can have nonnegligible correlations. Such Type I errors often occur in tandem with a Type II error where the active effect that is correlated with the term that is a Type I error no longer explains enough response variability to merit inclusion in the model. Despite this, we advocate analyzing the data making the strong heredity assumption because of its great utility in cases where only a few main effects prove to be active.
If a practitioner is concerned about missing an effect that does not exhibit model heredity, we offer two alternatives. First, model selection could be carried out both under the assumption of model heredity and also for the unrestricted case. If the models are in agreement, the assumption of heredity is clearly supported. On the other hand, if the unrestricted analysis finds a model that does not follow strong heredity, there is cause for concern. If the models are roughly comparable in terms of various model selection criteria, a decision may have to be made on the basis of the domain knowledge of the experimenter. If the models found have numbers of terms that approach the boundaries outlined above for reliable identification of secondorder effects and the models are markedly different, this may indicate that the experiment requires augmentation. In an alternative approach, the second stage of the new analysis method could be repeated starting with currently chosen second-order effects "forced" into the model, but adding all the remaining second-order effects for consideration. These approaches are under investigation.
We note that our analytical approach need not uniquely apply to DSDs. With minor modifications to deal with main effects having small pairwise correlations, it is possible to apply this approach to any foldover design. This also suggests that foldover designs having more than 2m runs are desirable because of their implicit use of "fake factors. " Errore et al. (in press a,b) explored this and related ideas.