Scalable level-wise screening experiments using locating arrays

Abstract Alternative design and analysis methods for screening experiments based on locating arrays are presented. The number of runs in a locating array grows logarithmically based on the number of factors, providing efficient methods for screening complex engineered systems, especially those with large numbers of categorical factors having different numbers of levels. Our analysis method focuses on levels of factors in the identification of important main effects and two-way interactions. We demonstrate the validity of our design and analysis methods on both well-studied and synthetic data sets and investigate both statistical and combinatorial properties of locating arrays that appear to be related to their screening capability.


Introduction
When a system is complex, it is challenging to characterize and predict its behavior and performance.The power grid, the Internet, and transportation networks are examples of complex engineered networks that play an increasingly critical role in society.Yet our understanding of them remains limited (Chiang and Rao 2012).
Screening is often the first step in experimentation with a system, and its objective is to identify important factors that significantly impact the response variables.Screening experiments aim to eliminate factors that are not involved in any active effect, reducing the number of factors included in further experimentation; screening need not ensure that all remaining factors and their effects are important.
The size of the design spaces of complex systems dictates the need for scalable screening designs.Supersaturated designs are among the smallest screening designs, with the expected number of runs growing linearly based on the number of factors (Gilmour 2006;Li and Lin 2003;Lin 1993;Montgomery 2017;Nguyen 1996).Some screening designs aggregate the factors into groups, such as sequential bifurcation (Kleijnen, Bettonvil, and Persson 2006), to reduce the size of the design.Grouping requires care to ensure that factor effects do not cancel each other out.In studies of the Internet, specific levels of factors are known to give rise to active effects, and combinations of levels of factors may be exploited to improve network performance (see, e.g., Djama et al. 2008;Huang et al. 2020;Melodia and Akyildiz 2010;Sagduyu and Ephremides 2007;Shin, Park, and Kwon 2014;Song and Hatzinakos 2007;Vadde and Syrotiuk 2004;Verikoukis, Alonso, and Giamalis 2005).As systems become larger and more complex, domain expertise alone appears to be insufficient to make decisions concerning factor restriction or grouping, and techniques are needed to identify interactions and cope with categorical factors.
This paper is the first to demonstrate the feasibility of a locating array, a type of combinatorial design, as a screening design to address all these issues.

Scalability
Locating arrays have an expected number of runs that grows logarithmically based on the number of factors (Colbourn and McClary 2008), making it practical for experiments to consider more factors-an order of magnitude more-than allowed by traditional screening methods.This can eliminate the need for grouping.

Level-wise screening
Locating arrays provide t-way coverage, ensuring that all t-way level combinations are present in the design; usually, the interest is in t 2 because higher-order interactions tend to have less effect (Li, Sudarsanam, and Frey 2006;Montgomery 2017), but the definition is general.Locating arrays also satisfy a locating property that, together with coverage, naturally accommodates categorical factors.The proposed analysis algorithm using a locating array combined with a compressive sensing matrix validates the results of both widely studied experimental data and synthetic data sets.Information about specific levels of factors may be informative for experimenters and users of a system.
Section 2 provides a precise definition of locating arrays and a discussion of some of their properties.
As Section 3 details, a compressive sensing matrix (CSM) is used as the model matrix in the analysis method because it is well suited for identifying levelwise effects.Although over-parameterized, it is often effective in identifying a sparse explanation for variability in a response.
To illustrate these aspects of a CSM, we consider a contrived example.Table 1a shows a 3 2 full-factorial design with two categorical factors A and B, each with three levels and two responses, y 1 and y 2 : The mean value for y 1 is y 1 ¼ 1 9 P 9 i¼1 y 1i ¼ 5: The mean response of A at level 0, as well as at levels 1 and 2, is 2þ7þ6 3 ¼ 5; hence, A does not impact y 1 : Similarly, B does not impact y 1 : Yet each of the two-way interaction effects of A and B deviates from 5 and impacts y 1 : The CSM for this example has 16 columns: three for each of two factors, one for each of nine two-way interactions, and one for the intercept; see Table 1b.When the CSM is used with the proposed screening method, nine terms with each two-way interaction of A and B, except A ¼ 1, B ¼ 1, and the intercept, are identified for y 1 : If dummy coding with A ¼ 1 and B ¼ 1 as the reference level is used with the proposed screening method, it also identifies nine terms: and the four two-way interactions between these levels for A and B. Of course, these results are not wrong, but they may be difficult to interpret.
The mean value of y 2 is y 2 ¼ 1 9 P 9 i¼1 y 2i ¼ 1: The mean value for y 2 is 0 at A ¼ 0, A ¼ 2, B ¼ 0, and B ¼ 2, and it is 3 at A ¼ 1 and B ¼ 1: Of the interaction terms, only A ¼ 1 and B ¼ 1 has an impact on y 2 : When the proposed screening method is used with a CSM, a single interaction term A ¼ 1 and B ¼ 1 is identified for y 2 : If dummy coding with A ¼ 1 and B ¼ 1 is used as the reference level, it identifies all nine terms because it lacks A ¼ 1 and B ¼ 1 as a choice.Again, the terms identified are not wrong; this illustrates the issue with sparsity.
In Section 3, the proposed analysis method is presented in two algorithms.The first algorithm conducts a search in which many possible models with levelwise effects are developed to account for variability in the response.Effects are scored based on their perceived explanatory value.The second algorithm produces a ranking of the important terms through aggregation of the scores across the models developed.
Results using the proposed analysis method on widely studied data sets are presented in Section 4. The purpose of the study is to demonstrate that the proposed analysis method is effective, even for experiments with a small number of factors, and agrees with accepted results.To show that the method does not depend on the CSM, results using alternative model matrices are presented in Section 4.3.1.Guidance on using the tools developed is provided for practitioners in Section 4.1.
What, then, makes a locating array combined with the proposed method of analysis effective?Section 5 examines both statistical and combinatorial properties of designs that contribute to their ability to screen effectively.Also included is a comparison of the results of a locating array with a few supersaturated designs on synthetic data sets.Keeping in mind the coupling between design and analysis, a better understanding of such properties and other characteristics may yield requirements that can be integrated into the construction of screening designs based on locating arrays.Finally, conclusions and future research directions are discussed in Section 6.

Locating arrays: Definition and background
Consider a system with k factors, each having levels chosen from a set S i of cardinality s i , 1 i k: An N Â k array A with levels in the ith column chosen from a set S i has type ðs 1 , :::, s k Þ: A t-way level combination consists of a choice of a set I ¼ fi 1 , :::, i t g of t columns and the selection of levels r i 2 S i for i 2 I, and it is represented as For brevity, we use the term level-wise t-way interaction for both the t-way level combination and the corresponding model effect.
A covering array of strength t is an N Â k array of type ðs 1 , :::, s k Þ in which, for every N Â t subarray, each level-wise t-way interaction is covered (i.e., occurs) in at least one run (Hartman 2005).This generalizes the definition of an orthogonal array, which is a covering array of strength t in which every interaction is covered exactly the same number of times.Table 2a shows a covering array A 1 of strength two for k ¼ 4 factors of type ð2, 2, 3, 3Þ: Nine runs suffice to cover all 37 of the level-wise two-way interactions.For example, the level-wise two-way interaction ðA ¼ 0, C ¼ 2Þ is covered in run five, shaded in blue.Covering arrays are a commonly used combinatorial testing method for real-world software (Kuhn, Kacker, and Lei 2013).
While a covering array of strength t covers all level-wise t-way interactions, it does not ensure that it is possible to separate different interactions.For example, if the response measured for run five of A 1 deviates from that in the other runs, it is not possible to determine which of the three level-wise two-way interactions, ðA ¼ 0, B ¼ 1Þ, ðA ¼ 0, C ¼ 2Þ, or ðC ¼ 2, D ¼ 1Þ, is responsible because each one occurs only in run five.Locating arrays extend covering arrays to address this very issue.
A ðd, tÞ-locating array (Colbourn and McClary 2008) is a covering array of strength t of type ðs 1 , :::, s k Þ with an additional property: any set of d level-wise t-way interactions can be distinguished from any other such set by appearing in a distinct set of runs.If an array satisfies this condition, it has the ðd, tÞ-locating property.
More precisely, for array A and level-wise t-way interaction T, define q A ðTÞ to be the set of runs of A in which T is covered.For a set T of level-wise t-way interactions, q A ðT Þ ¼ [ T2T q A ðTÞ: Let I t be the set of all level-wise t-way interactions for an array, and let I t be the set of all level-wise interactions of size at most t.Consider a level-wise interaction T 2 I t of size less than t.Any level-wise t-way interaction T 0 of size t that contains T necessarily has q A ðT 0 Þ q A ðTÞ: Call a subset T 0 of level-wise t-way interactions in I t independent if T, T 0 2 T 0 does not exist with T T 0 : Definition 2.1 (ðd, tÞ-Locating Array (Colbourn and McClary 2008)).
An array A is ðd, tÞ-locating if, whenever T 1 , T 2 I t , jT 1 j ¼ d, and jT 2 j ¼ d, it holds that The definition is extended to permit level-wise interactions of at most t, writing t in place of t, by permitting that T 1 , T 2 I t and requiring that T 1 and T 2 be independent.For all instances of locating arrays, the relationships among them, and numerous examples, see Colbourn and McClary (2008); see also Section A of the Supplementary Material.For a more detailed discussion of the combinatorial requirements of locating arrays as generalizations of covering arrays, see Colbourn and Syrotiuk (2018).
The covering array A 1 in Table 2a does not have the ð1, 2Þ-locating property because the set of three level-wise two-way interactions However, the array A 2 in Table 2b is a ð1, 2Þ-locating array.For each level-wise two-way interaction in T there is a run that distinguishes it from the others: To quantify the degree to which level-wise t-way interactions can be distinguished in an array, the separation between sets of runs for different sets of level-wise t-way interactions is introduced (Seidel, Both arrays have type ð2, 2, 3, 3Þ: Sarkar , et al. 2018).More precisely, for a positive integer d, an array A is ðd, t, dÞ-locating if, whenever A ðd, t, dÞ-locating array guarantees that any two sets of d level-wise t-way interactions are separated by at least d runs.By definition, a locating array has a separation of at least one: for example, A 2 is ð1, 2, 1Þ-locating.A locating array with larger d is more robust to, for example, outliers or missing data; however, there is a tradeoff between large d and small array size.Colbourn and McClary (2008) use covering arrays of strength t þ 1 for constructing ð1, tÞ-locating arrays and show that their size grows logarithmically based on the number of factors.Their rate of growth is slower than that of orthogonal arrays, which, by the Rao bound, are known to grow polynomially based on the number of factors (Hedayat, Sloane, and Stufken 1999).A locating array is preferable to a covering array for screening because it can separate the effects of any two t-way level-wise interactions; it is preferable to an orthogonal array because it is more economical in terms of run size.
Mart ınez et al. ( 2010) establish feasibility conditions for a locating array to exist.Tang, Colbourn, and Yin (2012) provide a general construction method for locating arrays.Colbourn and Fan (2016) develop three recursive constructions for locating arrays when ðd, tÞ ¼ ð1, 2Þ: The size-optimality of a ð1, 1Þ-locating array is studied in Colbourn, Fan, and Horsley (2016).Seidel, Sarkar, et al. (2018) discuss two randomized algorithms for constructing locating arrays based on the Stein-Lov asz-Johnson paradigm and the Lov asz Local Lemma.The implementation of these algorithms is publicly available (Seidel 2019).

A level-wise screening method
The proposed level-wise screening method assumes a locating array as the screening design, with one or more response vectors.A compressive sensing matrix (CSM) is used to represent the model matrix for the level-wise effects.For each response, the analysis method identifies a user-specified number of levelwise models that provide the "best" explanations for the response.Aggregation over these models, usually restricted to level-wise one-and two-way effects, results in the identification of important factors for each response.Each of these steps is described next.

The screening design and model matrix
A ð1, 2Þ-locating array is proposed as the screening design for two reasons.The coverage and locating properties of such an array are essential for separating level-wise one-and two-way effects.And, as Section 3.2 describes, the proposed analysis method recovers the "strongest" level-wise main effect or two-way interaction, one iteration at a time.
For the model matrix, a 61 CSM is proposed in which the columns correspond to an intercept and all level-wise one-and two-way effects.A similar idea has been used for the recovery of sparsifiable signals in communications and storage systems (Baraniuk 2007).
A CSM for an N Â k ð1, 2Þ-locating array A of type ðs 1 , :::, s k Þ has as many rows as runs in A, and it has columns corresponding to the candidate level-wise terms.Specifically, the CSM M ¼ ðm ij Þ has N rows and , where m ij ¼ þ1 if level-wise effect j is covered in the ith run of A, and m ij ¼ À1 otherwise.A column of all þ1 is required for the intercept.Table 3 shows the CSM for the locating array A 2 in Table 2b.For compactness of representation, we write 6 instead of 61:

The screening method
To achieve a small run size, locating arrays may exhibit a highly unbalanced structure.This requires the development of a method for level-wise screening that can cope with imbalance.(Locating arrays for few factors are typically close to balanced; imbalance increases with the number of factors.) The proposed level-wise screening method has two steps.First, a breadth-first search (BFS) algorithm is developed to identify a user-specified number of levelwise models that are the "best" explanations of a response using orthogonal matching pursuit (OMP; Davis, Mallat, and Avellaneda 1997), which is widely used in signal processing (Tropp and Gilbert 2007) to recover sparse signals.A matching pursuit is a greedy algorithm that progressively refines an approximation of an optimization problem with an iterative procedure instead of solving it optimally.The vector selected at each iteration by the matching pursuit algorithm is generally not orthogonal to the previously selected vectors.In OMP, the approximations are refined by orthogonalizing the directions of projection.
Second, using the models produced in the BFS, the screening algorithm aggregates level-wise main effects and two-way interactions to identify the candidate important effects.The "many-model" method (Holcomb, Montgomery, and Carlyle 2003) also retains a fraction of best models based on the error sum of squares, but it does not appear to scale to large numbers of factors.

The breadth-first search (BFS) algorithm
The BFS algorithm is parameterized by three userspecified variables: n models giving the number of fitted models that the algorithm returns, n new giving the fanout (i.e., number of children) of each node in the BFS tree, and n terms giving the number of effects in each of the final fitted models.In the BFS tree, the nodes at level l correspond to fitted models with l effects.The algorithm generates a BFS tree of height n terms : The BFS algorithm is given in Algorithm 1; Figure 1 illustrates the BFS tree.The root of the tree is a single model consisting of the mean response and a score initialized to zero.A BFS expands each node at level l to n new nodes at level l þ 1 (line 8).For efficiency, the search tree is stored implicitly, with each model of length l stored in priority queue q l : Each child expands the fitted model of its parent by adding the ith most important effect for 1 i n new using OMP (line 9).Specifically, the ith most important effect corresponds to the ith effect in the ranking of the absolute values of the dot products or correlations of each column in the CSM with the current residual vector.The model is expanded to include the ith effect.Then the ordinary least squares (OLS) method (Searle 1987) is used to update coefficient estimates in the expanded model, after which the residuals are updated and the score of the added effect is computed (lines 10-14).(OMP for logistic regression (Lozano, Swirszcz, and Abe 2011) can be used for binary responses.) The increment in R 2 of the expanded model that results from adding the ith effect is used as the score of the effect.The model, its residuals, its R 2 and adjusted R 2 , and its scores are then inserted into the priority queue of length l þ 1 (line 15).The priority queue of length l þ 1 retains at most n models in decreasing order by R 2 rank.These steps are repeated until a stopping criterion is met.To simplify the algorithm, it stops when each model has n terms level-wise effects (line 5).The model matrix, stopping criterion, and scoring method of the BFS algorithm can each be chosen differently.Indeed, even the search tree itself can be explored using a different method, such as a branch-and-bound or Least-Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996) approach to solving the best-subset selection problem (Miller 1990).

8:
for end for 17: end while 18: end for 19: return list of n models fitted models from q n terms ranked by R 2 value It is possible for duplicate fitted models to arise in q l , such as when the same terms are selected but in a different order.While only unique fitted models are kept in the queue, duplicates are accounted for by adding the scores of each effect from the duplicate.Thus, duplication is not ignored; it allows more models to be explored.

The screening algorithm
In screening, the objective is to identify a few important main effects and two-way interactions.One approach is to examine the scores of the level-wise effects in the list of fitted models produced and select those with higher scores.Instead, the approach in Algorithm 2 aggregates the effects of all n models without explicitly considering levels.
Algorithm 2 Screening(effects, q n terms , n models , n terms ) Input: List of all candidate effects, list of fitted models q n terms and corresponding scores from Algorithm 1, number of fitted models n models in the list, number of effects n terms in each fitted model Output: A list of effects in nonincreasing order by score 1: Initialize the score of each effect to zero 2: for i 1, :::, n models do 3: ðmodel i , scores i Þ dequeueðqÞ 4: for j 1, :::, n terms do 5: k index of effects corresponding to term j in model i j 6: effect-score k ¼ effect-score k þ scores i j 7: end for 8: end for 9: return list of effects ranked by effect-score Effects are reported in nonincreasing order by their aggregate scores to support user interpretation of the results.Screening results are usually reported with a consideration of heredity; therefore, if an interaction effect X Â Y is reported as active, then X and Y are also considered active (Li, Sudarsanam, and Frey 2006).

Analyzing data from real experiments:
Guidance and validation

Guidance for practitioners
To construct a locating array with parameters ðd, t, dÞ for k factors of type ðs 1 , :::, s k Þ, see Section B of the Supplementary Material for available construction tools; it also describes tools to extract a locating array from an existing design and a tool to extend an array with additional runs until it satisfies the requirements of a locating array.Locating arrays appear to have no strong competitors as screening designs capable of isolating one-and two-way effects efficiently.
If the array used for a screening experiment is an N Â k ð1, 2Þ-locating array of type ðs 1 , :::, s k Þ, then the corresponding CSM has N rows and , that is, it runs in polynomial time.As the fan-out n new increases, more models are evaluated.As the number of models n models decreases, many models are likely to be discarded, as the priority queue retains only n models models at each level.As n terms increases, successive effects identified by OMP in each iteration are likely to have a lower score.
While the choice of n new , n models , and n terms is application dependent, we have found that n models ¼ 50 and n new ¼ 50 work well in practice.For the number of terms, we suggest starting with n terms ¼ 2 when using the analysis tool (see Section B.3 of the Supplementary Material).When n terms is incremented but the screening result (i.e., the list of top effects returned by Algorithm 2) remains the same, there is likely no need to increment n terms again and explore models with more terms.

Validation of widely studied screening experiments
To validate the results of a screening experiment, a locating array must be extracted as a subset of the runs of the original screening designs because it is not feasible to conduct new runs of the experiment; see Section B.1 of the Supplementary Material for details.If a locating array can be extracted, along with the data collected in each run, the analysis can be performed.Next, we validate the results of several widely studied real-world screening experiments using n models ¼ 50, n new ¼ 50, and starting at n terms ¼ 2: The purpose of this validation study is not to advocate the use of locating arrays but to demonstrate that the proposed analysis method is effective, even for experiments with a small number of factors, and agrees with accepted results.

Chemical reactor experiment
Box, Hunter, and Hunter (2005) describe a chemical reactor experiment with five binary factors A-E.The experiment is run as a 2 5 full-factorial, and the original analysis indicates that B, D, E, B Â D, and D Â E are active.Miller and Sitter (2001) extract runs from the full-factorial design that correspond to a foldedover 12-run Plackett-Burman design and identify the same set of effects.
The interaction B Â D has the highest score for both locating arrays.Because the score for D Â E is much lower, it may or may not be active.If the top two terms are considered and heredity is used, then B, D, E, B Â D, and D Â E are the selected effects.This suggests that the factors B, D, and E should be included in follow-up experimentation.
Algorithms 1 and 2 identify the influential factors using fewer runs.Moreover, our screening results are consistent across different locating arrays.This indicates that the proposed screening method is robust and does not rely on the choice of runs in a locating array.Similar conclusions were found for a contaminant and a cast fatigue experiment; see Sections C and D, respectively, of the Supplementary Material.

Other model parameterizations instead of the compressive sensing matrix
For illustration, we consider alternative parameterizations for the chemical reactor experiment in Section 4.3.To use dummy coding, for a factor with k levels, we add f0, 1g indicator columns for the first k À 1 levels of the factor and add two-way interaction-effect columns by multiplying pairs of main-effect columns for different factors.Then we replace the 0s in those columns by À1s.The resulting matrix is used instead of the CSM for screening the data with the two locating arrays in Table 4.When n terms ¼ 8, R 2 > 0:98; Table 6 lists the top five terms and their scores.With these parameters, the scores for some effects become very high because they occur in many of the n models ¼ 50 fitted models, some of which are duplicated more than 200 times.
Table 7 presents the top five terms and their scores using the more typical parameterization for two-level factors consisting of a matrix that contains a single column for each main effect and each two-way interaction instead of using the CSM for the two locating arrays in Table 4 (n terms ¼ 8, R 2 > 0:98).
The results in Tables 6 and 7 indicate that using the common main effects and two-factor interactions agree with the original analysis better than using dummycoded variables.They also suggest that the 11-run locating array performs better than the 9-run locating array.

Rubber experiment
Williams (1968) conducted an experiment to improve a rubber-making process.This data set has appeared in numerous studies focusing on supersaturated designs; see Section E of the Supplementary Material for a history of the analyses performed.Sundberg (2008) observed that the original data set has an outlier in run 14 and suggested that the data be analyzed on the log scale.We extracted two distinct ð1, 2, 1Þ-locating arrays A 1 and A 2 (in Tables E.1 and E.2 in Section E) with 20 runs from the 28-run data set in Sundberg (2008).A log transformation of the data was performed before analysis.
Table 8a reports the top five terms and their scores for A 1 when n terms ¼ 5; all fitted models have R 2 !0:96: Table 8b reports the same results for A 2 when n terms ¼ 7; all fitted models have R 2 !0:97: Screening using A 1 indicates that the main effect of N dominates all other terms, while screening using A 2 suggests that the interaction I Â N has a significant impact on the response.A possible reason for these scores could be the presence of the outlier in A 2 ; because A 2 has d ¼ 1, a single outlier reduces its locating ability.Nevertheless, by heredity, N is identified as one of the important factors.Therefore, we are able to recover the active effect by screening using 20run instead of 28-run designs.
As it turns out, the 28-run design used by Sundberg ( 2008) is a ð1, 2, 4Þ-locating array.Table 8c reports the top five effects and their scores when n terms ¼ 7: The proposed screening method identifies factor N as the dominant active effect, even though the outlier is included in this data set.This suggests that a locating array with higher separation (d ¼ 4) is able to reduce the impact of outliers.
Table 7. Top five effects and their scores for the chemical reactor experiment using common effects model: (a) the 9-run locating array; (b) the 11-run locating array in Table 8.Top five effects and their scores for the rubber experiment using 20-run locating arrays (a) A 1 ; (b) A 2 ; (c) the 28-run design from Sundberg (2008), which is a ð1, 2, 4Þ-locating array.
The original analysis indicated that the factors interference channel occupancy (intCOR), channel band (band), transmission power (txpower), and Wi-Fi bit rate (rate) have a significant impact on audio quality.Similarly, rate, txpower, audio codec bit rate (codecBitrate), frame length aggregation (frameLen), and band show a significant impact on RF exposure.
Using the proposed analysis method based on a 109run ð1, 2, 1Þ-locating array, Table 9a shows the top five effects and their scores for audio quality when n terms ¼ 9; in this case, the fitted models have 0:74 R 2 0:76: For n terms ¼ 21, all the fitted models have R 2 !0:96; the results are shown in Table 9b.The method identifies txpower and the interaction intCOR Â band as having a significant impact on audio quality.
Table 9c lists the top five effects and their scores for RF exposure when n terms ¼ 10; these fitted models have R 2 !0:96: The proposed screening method identifies txpower and band as the important factors for RF exposure.
We also analyze the data from the wireless network experiment using two additional methods: the Dantzig selector method (Phoa, Pan, and Xu 2009) and LASSO regression (Tibshirani 1996).For each method, we use two different coding schemes: dummy coding (as described in Section 4.3.1)and orthogonal polynomial coding.
For the Dantzig selector, we use the R package "flare" (Li 2013).We scale the columns of the model matrix and set the tuning parameter d Dantzig ¼ 0:1 and the number of d Dantzig to 11; for how to choose these parameters, see Phoa, Pan, and Xu (2009).We fix c (a threshold between signal and noise) at zero to keep all selected effects.In LASSO regression, we use the R package "glmnet" for analysis (Friedman, Hastie, and Tibshirani 2009).
Table 10 summarizes the terms found by each of the three methods.There is good agreement on the screening results for both responses when the polynomial model is used.However, with dummy coding, neither the Dantzig selector nor the LASSO regression method appears to be as accurate.

Combinatorial and statistical properties of designs
In an effort to understand what contributes to the ability of locating arrays to screen effectively, we discuss a few of their combinatorial and statistical properties.
To understand the significance of the locating property in our screening method, we study it in a more controlled environment.For the study, we use three supersaturated designs, D 0 , D 1 , and D 2 , each with 18 runs for 21 binary factors; see Tables F.1, F.2, and F.3, respectively, in Section F of the Supplementary Material.
All three designs are strength-two covering arrays.Only D 0 is a ð1, 2, 1Þ-locating array.While the minimum size of a strength-two covering array for 21 binary factors is eight runs, additional runs are required to satisfy the locating property.The lower bound on the number of runs for a ð1, 2Þ-locating array for 21 binary factors is 12; see Theorem 2.2 in Tang, Colbourn, and Yin (2012).
The Eðs 2 Þ-criterion for choosing binary supersaturated designs minimizes the sum of squares of the information matrix entries for a main-effects model over balanced designs.The UEðs 2 Þ-criterion is similar to the Eðs 2 Þ-criterion, except that the requirement of factor-level balance is dropped (Jones and Majumdar 2014;Cheng et al. 2018).Both D 0 and D 1 are unbalanced in terms of factor-level occurrences, but only D 1 is UEðs 2 Þ-optimal.The locating array D 0 does not satisfy either optimality criterion.Though balanced, D 2 is not Eðs 2 Þ-optimal.
The maxjsj-criterion for binary designs considers correlations between columns of the model matrix for the main-effects model and selects a design that minimizes the maximum absolute correlation.D 2 has a smaller maxjsj-criterion than D 0 and D 1 : The resolution rank (r-rank) for a binary design is the largest value r so that all main effects are estimable for any model with r main effects (Deng, Lin, and Wang 1999); a discussion of this property follows a generalization of r-rank later in this section.The properties of D 0 , D 1 , and D 2 are summarized in Table 11.
We obtain simulated data from D 0 , D 1 , and D 2 using the 98 linear models in Table G.1 in Section G of the Supplementary Material.The models studied explore the impact of indistinguishable pairs of levelwise interactions.The arrays D 1 and D 2 , which do not satisfy the ð1, 2Þ-locating property, have 21 and 22 pairs of level-wise interactions not separated by any runs, respectively.
Table G.1 lists the results of our proposed screening method with n terms ¼ 6 and the Dantzig selector method with standard orthogonal polynomial coding, minimum d Dantzig ¼ 0:1, 0:25, number of d Dantzig ¼ 11, and c ¼ 0: The results indicate that the proposed screening method using the locating array D 0 is able to recover the interactions.
Here we discuss the first two models in Table G.1, y 1 and y 2 , in some detail: where the indicator function IndðÁÞ ¼ 1 only in runs where the factors equal the levels specified.The error term e is randomly selected from Nð0, 1Þ: The two-way interactions ðD 12 shows that, for y 1 , the interaction D Â O has a relatively high score for D 0 and D 1 , whereas both D Â O and L Â S have high scores for D 2 : Table 13 shows the top five scores for y 2 : In this case, the two-way interactions ðB ¼ 1, P ¼ À1Þ and ðS ¼ 1, T ¼ À1Þ are indistinguishable in D 1 because q D 1 ððB ¼ 1, P ¼ À1ÞÞ ¼ q D 1 ððS ¼ 1, T ¼ À1ÞÞ ¼ f8, 10, 12g: The proposed method identifies the interactions B Â P when analyzing y 2 using D 2 and D 0 : However, two interactions, B Â P and S Â T, have the same score when analyzing y 2 using D 1 : Not We now look for some statistical properties that contribute to locating arrays' ability to screen effectively.The r-rank of a design has been used as a statistical indicator of screening effectiveness for main-effect models for binary factors.Since we are also interested in two-way interaction effects, we propose a generalization for binary designs that also considers interaction effects: Definition 5.1 (ðr, iÞ-Rank) Let X ¼ ðX 1 jX 2 Þ, where X 1 corresponds to all main-effects columns and X 2 to all two-factor interaction columns under the common parameterization for binary factors.For a given i !0, the ðr, iÞ-rank of X is defined as the largest value r so that all effects are estimable for any model with r À i columns from X 1 and i columns from X 2 : The ðr, 0Þ-rank is just the usual r-rank.The ðr, iÞ-ranks for the three designs used in the simulation study are given in Table 14.The ðr, 0Þ-and ðr, 2Þ-ranks of D 0 are greater than those of D 1 and D 2 : When we compare the screening performance of D 1 and D 2 in Table G.1, the design D 2 appears better able to recover the interactions.This suggests the possibility of sequentially maximizing the ðr, iÞ-rank for improved screening and estimation of interaction effects.
Returning to combinatorial properties, the separation between sets of runs for different sets of level-wise interactions is an indicator of screening effectiveness.It quantifies the degree to which levelwise interactions can be distinguished, and a high separation can be useful when dealing with outliers or missing data.Intuitively, the better the ability to distinguish between sets of interactions, the better the screening effectiveness.For example, both the 28-run design and the 20-run design A 2 in the rubber experiment in Section 4.4 contain an outlier.The 28-run design has a separation of four, while A 2 has a separation of one.The screening results in Table 8 indicate that the 28-run design performs better.
To compare locating arrays of the same dimensions with the same separation, we introduce a new quantifier: Definition 5.2 (ðd, t, dÞ-Separation Deficiency) For a given strength t locating array A and positive integers d and d, the ðd, t, dÞ-separation deficiency is the number of pairs of sets T 1 , T 2 I t of cardinality d for which jðq The ðd, t, dÞ-separation deficiency is defined in a similar fashion by considering level-wise interactions of strength at most t.An array with a ðd, t, dÞ-separation deficiency of zero is a ðd, t, dÞ-locating array.When d ¼ 1, such an array is simply a ðd, tÞ-locating array.A ðd, t, dÞ-locating array with lower ðd, t, d þ 1Þseparation deficiency is preferred over the other ðd, t, dÞ-locating arrays for the proposed screening method.
We believe that an array with lower deficiency is better able to distinguish between sets of level-wise interactions.For example, the design matrix in the rubber experiment with 24 binary factors in Section 4.4 has 1012 level-wise two-way interactions in I 2 , and no two of them are covered in the same set of runs in arrays A 1 and A 2 in Tables E.1 and E.2.For any T 1 , T 2 2 I 2 , jðq A 1 ðT 1 Þ [ q A 1 ðT 2 ÞÞ n ðq A 1 ðT 1 Þ \ q A 1 ðT 2 ÞÞj !1: Therefore, A 1 is a ð1, 2, 1Þ-locating array.However, A 1 is not ð1, 2, 2Þ-locating.There are 218 pairs of two-way interactions that can be distinguished using only one run in A 1 : Therefore, the ð1, 2, 2Þ-separation deficiency of A 1 is 218.
Similarly, A 2 is a ð1, 2, 1Þ-locating array, and there are 161 pairs of two-way interactions that can be distinguished using only one run in A 2 : Consequently, A 2 is not a ð1, 2, 2Þ-locating array.The ð1, 2, 2Þ-separation deficiency of A 2 is 161.The ð1, 2, dÞ-separation deficiencies for A 1 , A 2 , and the 28-run design in Sundberg (2008) used in the rubber experiment in Section 4.4 are given in Table 15.Keeping in mind the coupling between design and analysis, a more thorough understanding of these and other statistical and combinatorial properties may yield requirements that can be integrated into the construction of screening designs based on locating arrays.

Conclusions and future work
This paper proposed designs and methods of analysis for screening experiments based on locating arrays.Locating arrays grow logarithmically based on the number of factors and are therefore well suited for identifying the factors that significantly impact the response variables of complex systems.Because such systems may have a large number of categorical factors with many levels, the proposed analysis method focuses on level-wise effects.This suggests the use of a parameterization method that works well to identify these effects, such as the compressive sensing matrix.As demonstrated by results on many real data sets, the analysis method appears to screen correctly using fewer runs than the original designs.Locating arrays with high separation may also provide some resistance to outliers or missing responses.
While we have demonstrated that the proposed design and analysis methods validate results of small, well-studied data sets, we anticipate additional advantages for studies of more complex systems with many factors, some of which are categorical.A more diverse and in-depth study of the statistical and combinatorial properties of locating arrays may better inform design construction and analysis.

Figure 1 .
Figure1.The breadth-first search tree generated by the analysis method.It has a fan-out n new from which the best n models models ranked by a score of R 2 are retained at each level in the priority queue.n models each with at most n terms level-wise terms are ultimately returned.

Table 1 .
(a) Value of responses y 1 and y 2 for a 3 2 -design.(b)Thecompressive sensing matrix for the design.
Algorithm 1 BFSðterms, M, data, n models , n terms , n new ) Input: List of candidate effects, CSM M, response vector data, number of fitted models n models to return, number of effects in each final fitted model n terms , fan-out of the BFS tree n new Output: List of n models best fitted models ranked by R 2 ðmodel new , residuals new , R 2 , adjR 2 , scores new ÞÞ 5: for l 1, :::, n terms do 6: while q l is nonempty do 7:ðmodel, residuals,R 2 ,adjR 2 ,scoresÞ dequeueðq l Þ

Table 3 .
The compressive sensing matrix for the locating array A 2 in Table2b.
new append increment in R 2 attributed to effects k to scores {Retain at most n models in the priority queue ranked by R 2 value} scores

Table 6 .
Top five effects and their scores for the chemical reactor experiment using dummy coding: (a) the 9-run locating array; (b) the 11-run locating array in Table4.

Table 5 .
Top five effects and their scores for the chemical reactor experiment using: (a) the 9-run locating array and (b) the 11-run locating array in Table4.

Table 4 .
Seidel, Mehari, et al. (2018)studied a Wi-Fi conferencing scenario where a speaker broadcasts voice traffic over a Wi-Fi network testbed to listeners.The speaker can configure 24 different factors of type ð2,

Table 9 .
Top five effects and their scores for audio quality when (a) n terms ¼ 9 and (b) n terms ¼ 21 and for (c) RF exposure in the wireless network experiment.

Table 10 .
Screening results for wireless network experiment listed by the method used.Only significant factors are listed.

Table 11 .
A comparison of properties of the three designs D 0 , D 1 , and D 2 used in the study.

Table 12 .
The top five scores for response y 1 in Model 1 produced by the proposed screening method using each of the arrays D 0 , D 1 , and D 2 .

Table 13 .
The top five scores for response y 2 in Model 2 produced by our screening method using each of the arrays D 0 , D 1 , and D 2 .

Table 14 .
The (r, i)-ranks for the designs D 0 , D 1 , and D 2 .