A family of orthogonal main effects screening designs for mixed-level factors

Abstract There is limited literature on screening when some factors are at three levels and others are at two levels. This topic has seen renewed interest of late following the introduction of the definitive screening design structure by Jones and Nachtsheim 2011 and Xiao et al. 2012. Two well-known examples are Taguchi’s L 18 and L 36 designs. However, these designs are limited in two ways. First, they only allow for either 18 or 36 runs, which is restrictive. Second, they provide no protection against bias of the main effects due to active two-factor interactions. In this article, we introduce a family of orthogonal, mixed-level screening designs in multiples of eight runs. Our 16-run design can accommodate up to four continuous three-level factors and up to eight two-level factors. The three-level factors must be continuous, whereas the two-level factors can be either continuous or categorical. All of our designs supply substantial bias protection of the main effects estimates due to active two-factor interactions.


Introduction
In this paper, we introduce and explore a new family of mixed, two-and three-level designs that are orthogonal for main effects and often have generally low levels of absolute correlation between main effects columns and two-factor interaction columns.Our designs are obtained by concatenating two replicates of a (three-level) definitive screening design (DSD) with a folded-over (two-level) Hadamard matrix design (Hedayat and Wallis 1978).When the number of three-level factors is k, where k ! 4 is a multiple of four, the number of runs is n ¼ 4k and the designs can accommodate up to k three-level factors and up to 2k two-level factors, the latter of which can be either continuous or categorical.We show that when k or fewer two-level factors are to be employed and n is a multiple of 16, the two-level columns can be chosen in such a way that main effects are completely independent of two-factor interactions.We note that the three-level factors must be continuous-the new designs are not appropriate for three-level categorical factors.
An outline of our paper is as follows.In Section 2, we briefly review the related literature.In Section 3, we provide the construction method and show that the resulting designs are orthogonal main-effects plans.We illustrate use of the method by constructing designs for eight, 16, 24, and 32 runs and examine the correlation structure for the 8-run and 16-run designs using correlation cell plots.In Section 4, we give an alternative construction method and show that when the number of two-level factors is less than or equal to k, and k 0 mod 4, the two-level factors can be assigned to columns in such a way to guarantee orthogonality between main effects and two-factor interactions.A simulation study, which characterizes the ability of the designs to identify active main effects and two-factor interactions, is provided in Section 5, and we conclude with a brief discussion in Section 6. Proofs are contained in the Supplementary Materials.

Previous work
The construction of mixed-level orthogonal, or nearlyorthogonal, designs has been a subject of increasing interest.Taguchi's orthogonal arrays are frequently used, but the ways in which two-and three-level factors can be employed is limited.In addition, confounding of main effects and two-factor interactions is often problematic.Wang and Wu 1992 developed nearly orthogonal arrays with mixed levels and run sizes of 12, 18, 20, and 24.Here again, the numbers of two-and three-level factors that can be accommodated is somewhat limited.Nguyen 1996 developed a method for constructing nearly-orthogonal arrays for situations in which the sample size does not match those of Taguchi's orthogonal arrays.Xu 2002 gave an efficient algorithm for constructing mixed-level orthogonal or nearly-orthogonal designs for small run sizes.Tang 2006 identifies families of two-level fractional factorial designs and designs for which all factors but one have two levels.Starks 1964 developed a method for obtaining some small orthogonal mixed two-and three-level designs via clever manipulation of Hadamard matrices.Similar to Jones and Nachtsheim 2013 and Yang, Lin, and Liu 2014 developed an approach to generating mixed, two-and three-level screening designs based on row-and column-augmented, folded-over conference matrices.Recently, Ares, Schoen, and Goos 2023 used mathematical programming techniques to search for mixed-level orthogonal designs for smaller numbers of factors than considered here.
With the development of conference-matrix-based definitive screening designs (DSDs), (Jones and Nachtsheim 2011;Xiao et al. 2012;Nguyen, Pham, and Mai 2020), interest in mixed two-and three-level DSDs has grown.Jones and Nachtsheim 2013 introduced mixed two-and three-level definitive screening designs, while Nachtsheim, Shen, and Lin 2017 and Nguyen, Pham, and Mai 2020 suggested alternatives and extensions.In each of these alternatives, the sample sizes are on the order of 2k, and the orthogonality of main effects is a casualty of the construction methods.
The approach in this paper is quite distinct from any of the aforementioned work.First, all of our designs are orthogonal main-effects plans, and second, our designs focus on the use of two-and three-level factors exclusively.These new designs exist for run sizes that are a multiple of eight, and main effects are never directly confounded with two-factor interactions.As noted above, when the number of two-level factors is less than or equal to the number of threelevel factors, the designs can be constructed in such a way that main effects are independent of two-factor interactions.

General design construction
In this section, we give the general design construction approach, and then use it to construct the 8-run and 16-run designs.Use of the approach to construct the 24-run and 32-run designs is detailed in the Supplemental Materials.

Method 1: General case
For the general case, we create a design with 4k runs, where k is an even number such that a conference matrix of order k exists.There are k three-level factors and 2k two-level factors.Let D 2k ¼ ½C T k , À C T k T denote a DSD with 2k rows, k columns, and no center run, where C k is a conference matrix of order k.Then let: Let OML a, b n , to denote the n-run OML design having a three-level factors and b two-level factors.Then To show that the resulting design is orthogonal for main effects, we have, from [1],

!
where I m denotes the identity matrix of order m.
Let G 1 ¼ DD and G 2 ¼ OA: Other properties of this design are as follows: orthogonal to all main effects.Orthogonality to main effects in G 1 derives from the properties of the DSD.We show that interactions in Proofs of these three properties are included in the Supplemental Materials.We note that main effects in G 1 are uncorrelated with interaction effects in G 1 Â G 2 , but not completely uncorrelated with interaction effects in G 2 Â G 2 (as demonstrated in the correlation cell plot of Figure 1(a) for n ¼ 8).

The 8-run design
We begin by introducing the construction procedure for an 8-run OML design-the smallest such design possible.Since n ¼ 8 ¼ 4k, we have k ¼ 2 and we require From [1] we have To estimate the quadratic effects of both three-level factors, it is necessary to add a center run assuming the two-level factors are continuous.If the two-level factors are categorical, adding two center runs in the three-level factors and an arbitrary foldover pair in the categorical factors has two effects.First, it allows for estimation of the quadratic effects of both threelevel factors.Second, it maintains the orthogonality of main effects and two-factor interactions for the two-level factors at the cost of small correlations of the main effects estimates.
The correlation cell plot in Figure 1(a) shows that the main effects are uncorrelated with each other (white cells).However, each main effect is correlated with four two-factor interactions with absolute correlations of 1= ffiffi ffi 2 p : Practitioners using this design should be aware that non-negligible two-factor interactions will cause problems with model selection.Moreover, three pairs of two-factor interactions involving the two-level factors are confounded, so these two-factor interactions are not uniquely identifiable.
The correlation cell plot on the right in Figure 1 shows that if we employ only two of the four two-level factors, we can avoid any correlation between main effects and two-factor interactions.However, one pair of two-factor interactions is confounded with another.

The 16-run design
Since n ¼ 16 ¼ 4k, we have k ¼ 4 and we require Again following the operations in [1], we obtain If all the factors are continuous then an added run at the center point will allow for fitting all four quadratic effects of the three-level factors.However, these quadratic effect estimates can be biased by some active two-factor interactions.There is substantial protection against bias of the main effects from active two-factor interactions.The three-level factors and two-level factors are each uncorrelated with their respective twofactor interactions.
If only columns 1-4, 6, 9, 11 and 12 are used, then no main effect is correlated with any two-factor interaction.The correlation cell plot for this case is shown in Figure 2, which shows that all the correlations between main effects and two-factor interactions are zero (white cells).There are correlations of various magnitudes among pairs of two-factor interactions (shades of gray).There are three pairs of two-factor interactions in the bottom right of the plot that are pure black indicating that these pairs are confounded.We note that adding the four missing two-level factors back into the design will introduce some correlations, though no confounding, between main effects and two-factor interactions.
4. Method 2: Creating subgroups of columns for which main effects are orthogonal to two-factor interactions A Hadamard matrix of order 2k for k ! 4 and k a multiple of four can be he constructed from a Hadamard matrix of order k using the doubling matrix: Substituting the right side of the equality in [2] into [1] for H 2k , we obtain an alternate expression for our construction: where G i , for i ¼ 1, 2, 3 denotes the ith group of columns, corresponding to the partitioning in [3].(Note that this construction cannot be used unless k ¼ 0 mod 4: For example, this construction is not applicable when n ¼ 4k ¼ 24, since k 2 mod 4 (i.e.there is no Hadamard matrix of order k ¼ 6).With this setup, we have the following result.
Theorem 1. Designs for k three-level factors and k two-level factors can be obtained by concatenating G 1 and G 2 or G 1 and G 3 .For either of these designs, main effects columns are pairwise orthogonal and orthogonal to all two-factor interaction columns.The proof is given in the Supplemental Materials.The implications of Theorem 1 are clear.If the experimenter has k or fewer two-level factors, then if the design is based on a concatenation of G 1 (or a subset of those columns) for three-level factors and up to k columns chosen from either G 2 or G 3 , estimates of main effects will not be biased by any active two-factor interactions.Also, if a few second-order terms are active, the experimenter may be able to identify them using a model selection procedure.
The second construction method is also useful for situations discussed in Lekivetz and Lin 2016), where multiple responses are believed to be driven by a different set of factors.That is, model selection with all main effects, but one response can be analyzed focusing on interactions in G 1 and G 2 , while another response can be analyzed using the interactions from G 1 and G 3 .Other properties of designs constructed via [3] include the following: 1. (Property 2.1) All main effects are orthogonal.2. (Property 2.2) Interactions in G i Â G i are orthogonal to all main effects.3. (Property 2.3) For i 6 ¼ j, interactions in G i Â G j are orthogonal to main effects and two-factor interactions in G i .
See the Supplemental Materials for proofs of the above assertions.

Notes on analysis and a simulation study
As noted by a referee, our designs are singular for the quadratic main-effects model that includes all 3k linear main effects and all k quadratic main effects.In this case, the intercept is equal to 1/(k À 1) times the sum of the quadratic columns, so the matrix is not of full rank when the design matrix includes intercept column.One simple approach is to fit the response vector to the full model without the intercept and then delete the column in the design matrix that corresponds to the minimum absolute regression coefficient obtained.Adding the intercept column back in at this point results in a full rank model.This procedure makes the assumption that at least one term in the full model will not be important.Alternatively, one can simply use a forward selection procedure or the DSD model selection approach given in Jones and Nachtsheim 2017.In what follows we report the results of a small simulation study to investigate the power of this family of designs to correctly identify active main effects and two-factor interactions.

Design of the study
In our simulation study, we limited consideration to the 16-run design with four factors at three levels and four factors at two levels.Our rationale for this choice was that designs in the family of designs with more runs will exhibit higher powers for identifying main effects and two-factor interactions.So, our choice of the 16-run design represents a worst case.We are ignoring the 8-run design which we do not recommend as a screening experiment.
There were three factors in the simulation study: the signal to noise ratio for each active effect (1, 2, or 3), the number of active main effects (3 or 4), and the number of active two-factor interactions (0, 1, 2, or 3).The full factorial design thus has 2x3x4 ¼ 24 runs.
For each run, we chose the specified number of main effects randomly from all eight two-and threelevel factors.The active two-factor interactions were chosen randomly from the set of two-factor interactions in such a way that the full model exhibited strong heredity.True coefficient values of each active effect were generated by adding an exponentially distributed random variable with mean 1.0 to the specified signal to noise ratio.The sign of the result was then chosen randomly.
For each of the 24 runs of the simulation, we generated 400 response vectors from the linear model constructed from the active effects applied to each of the 16 runs of the mixed-level design.We then added a vector of 16 independent standard normal random numbers.We analyzed each of the 400 data sets, for each of the 24 simulation settings, using the two-step procedure of Jones and Nachtsheim 2017.
We considered four responses-the main effect power, the two-factor interaction power, and the associated Type I error rates.We calculated the main effect power as the total number of correctly identified main effects divided by the total number of active main effects across the 400 data sets.The two-factor interaction powers are obtained in similar fashion.The Type I error rates were calculated, separately for main effects and for two-factor interactions, as the total number of false positives divided by the total number of inactive factors.

Study results
We now summarize the results of the simulation study.For main effects power, we modeled the observed power values as a function of the number of main effects and the signal-to-noise ratio.(The selection method identifies main effects independent of the number of two-factor interactions-see Jones and Nachtsheim 2017 for details.)For the main effects power, the only significant effect was the signal-tonoise ratio.The main effects power increased for increasing signal to noise ratio-increasing the signal to noise ratio by one increased the main effects power by 0.0097.Main effects plots for the number of main effects and the signal-to-noise ratio are provided in Figure 3. Basically, the predicted power is essentially 100%, irrespective of the factor levels in the simulation study design.The median Type I error for main effects was 0.048.
For the two-factor interaction power, all three main effects of the simulation factors were significant, and there was a significant quadratic effect due to signalto-noise ratio.Profile plots are shown in Figure 4.As the signal to noise ratio increased from one to two, the two-factor interaction power increased by 0.092, at which point it essentially leveled off.As the signal to noise ratio increased from two to three, the twofactor interaction power increased by only 0.012.Increasing either the number of active main effects or the number of active two-factor interactions decreased the two-factor interaction power.A one-unit change in either factor reduced the power by roughly 0.045.It is notable that with four active factors and signalto-noise ratio equal to 2, the design can identify three two-factor interactions with 92.4% power.The median Type I error rate for two-factor interactions was 0.005.
Our 16-run design using eight factors (four at three levels and four at two levels) is comparable to the resolution IV regular fractional factorial design in eight two-level factors.The ability to identify main effects will be comparable (near 1.0) for both designs.However, we note that for the fractional factorial design, no two-level contrast can be uniquely identified due to the fact that every two-factor interaction is confounded with three other two-factor interactions for this design.The fact that our designs can uniquely identify up to three two-factor interactions with high power is clearly advantageous.Finally, as pointed out in Table 7 of Jones and Nachtsheim 2011, DSDs can identify any quadratic effect of the three-level factors with a signal to noise ratio of 3 with a power greater than 0.95.The power for identifying quadratic effects with OML designs will be larger because the number of zeros in any three-level column is four, one more than in a DSD.The ability to detect quadratic effects does not exist for any design having factors at only two-levels.

Summary and discussion
We have introduced a class of orthogonal mixed-level screening designs and provided two methods of direct construction.The first method works for any multiple of eight runs.The second method requires that the  number of runs is a multiple of 16 so that the Hadamard matrix used in the construction has a number of runs that is a multiple of four.Both construction methods support a maximum number of factors for a given the number of runs.For example, the 16-run design can support four three-level factors and up to eight two-level factors.However, it is not necessary to assign a factor to every column of the constructed matrix.In fact, using the second construction method, if only four of the eight two-level factors are needed, then using the four three-level factors and the first four two-level factors yields a design with no correlations between main effects and two-factor interactions.In general, when using the second construction method, the experimenter can use the k three-level factors and the first k two-level factors thus avoiding any concern about aliasing of main effects by two-factor interactions.
It is now standard practice to build DSDs using a conference matrix and its foldover.Our designs replicate this part of a DSD and thus require roughly twice as many runs.However, the compensation for this expense is the ability to add many two-level factors so that the resulting design can accommodate up to 3n=4 factors-a very efficient use of resources.DSDs by comparison allow for roughly half as many factors as runs.If the experimenter needs to use all the columns in the OML design, then there will be some potential aliasing of main effects from active two-factor interactions that are not included in the model.This fact should be emphasized in training.
The conference matrices in our two constructions could be replaced by Hadamard matrices to create designs where all the factors are at two-levels.Alternatively, the Hadamard matrices could be replaced by conference matrices to accommodate more three-level factors.Because of the Kronecker product construction, such designs would supply more protection against aliasing from active two-factor interactions than other nonregular orthogonal arrays.Another possibility is to replace either the conference matrices or the Hadamard matrices in our construction with weighing designs with an appropriate number of runs.These alternatives are under active investigation.
As a final note, it is our experience that engineers and scientists are often uncomfortable using designs for continuous factors that have only two levels for each factor.This discomfort is due to concern that in many physical processes a factor's effect on a response is not linear.We have seen engineers place the levels of continuous factors close together in an effort to avoid excessive bias from nonlinear factor effects.Our designs featuring continuous factors at three levels allow for fitting a quadratic effect if warranted, which should substantially reduce concern about missing the curvature in a factor's effect on a response.

Disclosure statement
No potential conflict of interest was reported by the authors.

Figure 3 .
Figure 3. Main effects power as a function of the number of active main effects (nActiveME) and the signal-to-noise ratio (SN).

Figure 4 .
Figure 4. Two-factor interaction effects power as a function of the number of main effects (nActiveME), number of two-factor interactions (nActive2FI) and signal-to-noise ratio (SN).

Ryan
Lekivetz is an Advanced Analytics Manager at JMP Statistical Discovery LLC where he manages the team that implements features in the design of experiments and reliability platforms for JMP software.His research interests include design of experiments, combinatorial testing, and the intersection of the two.Bradley Jones is a Distinguished Research Fellow at JMP Statistical Discovery LLC where he does research in design of experiments and statistical methods.Christopher Nachtsheim is the Frank A. Donaldson Chair of operations management in the Carlson School of Management.He does research in design of experiments and related statistical methods.