Best practices for multi- and mixed-level supersaturated designs

Abstract Supersaturated designs offer cost-effective efficacy in discerning significant factors among a vast array of potential factors, thereby rendering them valuable. The current literature studies several design selection criteria and analysis methods for such designs. For two-level designs, the screening performance of optimal designs constructed under different optimality criteria remains similar, especially when the effect directions are not known in advance. The Gauss-Dantzig Selector (GDS) is the preferred analysis method for two-level designs. For the multi- and mixed-level supersaturated designs, despite the existence of multiple design optimality criteria and design construction methods, the literature lacks guidance for both the design selection and the choice of analysis method. Through extensive simulation studies, we show that the multi- and mixed-level designs constructed using different optimality criteria have equivalent screening performance for the unknown effect directions. For known effect directions, generalized minimum aberration-optimal designs have slightly better screening performance. On the analysis front, however, the story differs from two-level designs. While LASSO and GDS show superior performance among the analysis methods compared, they depend on the parameterization or the coding of factors. Since no single choice of parameterization is best across sparsity levels, scenarios, and designs, we propose using group LASSO, which is invariant to parameterizations. Finally, we characterize the settings in terms of the number of runs, factors, and the effect sparsity, which are too complex to get meaningful results from group LASSO.


Introduction
Screening experiments are typically used in manufacturing industries for first-stage experimentation, with the goal of identifying a few important factors out of potentially many factors.Due to the time-and money-related budgetary constraints, the experimenters want to fulfill the screening objective with as few runs as possible.Supersaturated designs (SSDs) are employed to fulfill this objective.Thus, SSDs are designs with fewer runs than the number of parameters under investigation.Designs with the same number of levels for all factors are called multi-level designs, whereas a mixed-level design has at least one factor with a different number of levels than other factors.An alternate term for the multi-level designs could be v-level designs since all factors have v-levels.Specifically, two-level designs correspond to multilevel designs with v ¼ 2. Nonetheless, we have opted to preserve the term "multi-level designs" to maintain consistency with previous literature on SSDs.We compare the screening performance of different multiand mixed-level SSDs and investigate the best method to analyze the data arising from SSDs.
SSDs were first introduced by Satterthwaite (1959) and were discussed in Box (1959).They gained recognition in 1962 with the introduction of the popular optimality criterion Eðs 2 Þ-optimality for two-level balanced designs (Booth and Cox 1962).Several authors then proposed optimality bounds and constructions of Eðs 2 Þ-optimal designs (Bulutoglu and Cheng 2004;Butler et al. 2001;Cheng 1997;Lin 1993;Nguyen 1996;Nguyen and Cheng 2008;Tang and Wu 1997).Other design criteria for SSDs, including the extension to unbalanced designs, such as UEðs 2 Þ optimality, Bayesian D-optimality, resolution rank, Var(sþ), etc., have also been studied (Cheng et al. 2018;Deng, Lin, andWang 1996, 1999;Jones et al. 2009;Jones, Lin, and Nachtsheim 2008;2020;Jones and Majumdar 2014;Marley and Woods 2010;Weese et al. 2021).Most recently, Singh and Stufken (2023) introduced an optimality framework using large-sample properties of the Gauss-Dantzig Selector (GDS).Several analysis methods are compared on their screening performance in Marley and Woods (2010); Weese, Smucker, and Edwards (2015), Weese, Edwards, andSmucker 2017, Weese et al. (2021).The significant conclusions for two-level designs are (a) GDS is the preferred analysis method, (b) designs constructed by Weese et al. (2021); Singh and Stufken (2023) have better screening performance in terms of power and type 1 error when the effect directions are assumed to be known, (c) the performance of optimal designs do not differ significantly from each other when effect directions are unknown, and (d) for effective screening performance of SSDs, the ratio of factors to runs should be less than two and the number of active factors should be at most a third of the number of runs (Marley and Woods 2010).
Engineering frequently encounters situations where some factors have more than two levels, requiring multi-and mixed-level optimal supersaturated designs.These factors could be either categorical, ordinal, or quantitative.Different codings can be used to represent different category of variables.For a categorical variable such as the color of the ink, dummy coding (described in the next section) is used, whereas a polynomial coding is used more often for other factors.While the optimality problem is more challenging than the one for two-level designs, several criteria (Fang, Lin, and Ma 2000;Xu and Wu 2005;Yamada and Lin 1999) and corresponding optimal designs exist.Several construction methods have also been proposed (see Fang, Ge, and Liu 2002b;Fang et al. 2004a;Fang, Lin, and Liu 2003;Fang, Lin, and Ma 2000;S. Georgiou and Koukouvinos 2006;Lu, Hu, and Zheng 2003;Sun, Lin, and Liu 2011, for example;Yamada et al. 1999).Despite their practical use, the literature lacks a comprehensive comparison of different analysis methods, and, in particular, whether GDS remains a preferred method of analysis is yet to be seen.It is worth noting that the model matrix needs to be reparameterized using some coding methodology (polynomial coding, etc.) before analyzing the data for multi-and mixed-level designs.Given the screening stage of the experimentation, it is ideal to consider the main effects for all identifiable contrasts for each factor.Natural questions arise then as to which coding technique provides better screening results and if there is consistency in the preferred analysis method across different parameterizations.We compare the screening performance of LASSO (Tibshirani 1996), GDS (Cand� es andTao 2005, 2007), SCAD (Fan and Li 2001), and group LASSO (Yuan and Lin 2006) on various such designs.We argue that the group LASSO is a preferred method to analyze such data since it is invariant to parameterizations and performs at par with the other three methods.We also provide guidance on best designs and the situations when one can hope for SSDs to result in a reasonable screening performance.In particular, we question if a specific design optimality criterion yields superior performance in comparison to the alternative criteria discussed and under what conditions should multi-and mixed-level supersaturated designs be used, and can the conclusions drawn by Marley and Woods (2010) for two-level designs be extended.
This article is organized as follows.Section 2 discusses different optimality criteria and analysis methods for multi-and mixed-level designs.We then compare these on their screening performance.Section 3 provides our concrete objectives, simulation settings, and results.Finally, Section 4 concludes the findings from Section 3 and provides practical guidance for using multi-and mixed-level SSDs.

Background on designs and analysis methods
Let the number of factors be m, and let the jth factor be at v j levels, j ¼ 1, :::, m: For two-level designs, v j ¼ 2 for all j.The n � m matrix with jth column taking values in f0, 1, :::, v j − 1g is the design matrix.For multi-and mixed-level designs, the model matrix is a parameterized matrix X obtained from the design matrix d.The model under study is where b is a p � 1 vector of parameters, b 0 is the overall mean, b j is a v j − 1 vector of parameters for jth factor, matrix X j is the n � ðv j − 1Þ parameterized matrix for jth group, and X is a n � p matrix, which is a collection of all X 0 j s: The model in Eq. [1] is a main effects model, and p ¼ P m j¼1 ðv j − 1Þ: Additionally, y and e are n � 1 vectors of responses and errors, respectively.We assume that y i 's are independent given the matrix X and that the error terms, � i , are normally distributed with mean 0 and variance r 2 : A SSD has fewer runs than the number of parameters to be estimated, that is, for SSDs, n < p þ 1: One could obtain X j using different parameterizations, for example, dummy coding, polynomial coding, etc.For three-level designs and polynomial coding, say, we can replace 0, 1, and 2 in design by ffi ffi ffi 6 p � in corresponding X j , respectively.For three-level designs and dummy coding, using level v j as a reference level, we can replace 0, 1, and 2 in design by ½10�, ½01� and ½00� in corresponding X j , respectively.Similarly, for the sum coding and three-level factor, one replaces 0, 1, and 2 in design by ½10�, ½01� and ½−1 − 1� in corresponding X j , respectively.A different parameterization or coding can result in different magnitude and signs of the estimates.Consider a design with one three-level factor 0, 1, and 2 with the corresponding response vector y ¼ ð10, 3, 2Þ: Without adding the error, the estimates for ðb 0 , b 1 , b 2 Þ using polynomial, dummy, and sum coding are given by ð5, − 5:6, 2:5Þ, ð2, 8, 1Þ, and ð5, 5, − 2Þ, respectively.Therefore, it is prudent to find a coding-invariant analysis method while analyzing multi-and mixed-level designs.
Recall that screening experiments aim to screen a few important factors out of the many potential factors.Factor j is considered "active" if at least one of their constituent entry in b j has a large absolute magnitude.We evaluate the screening performance of SSDs using two metrics: power and type 1 error.Power measures the proportion of important factors correctly identified, while type 1 error measures the proportion of unimportant factors incorrectly identified as important.High power and low type 1 error characterize a good design.It is worth noting that identifying the exact support of the true model is not our priority as long as at least one column for the corresponding factors is identified as active.Our metrics, power, and type 1 error, reflect this approach.We now provide brief reviews of existing design criteria and model selection methods.

Optimality criteria
There are three popular criteria for multi-and mixedlevel designs: v 2 , Eðf NOD Þ, and generalized minimum aberration (GMA) criteria.These criteria require the design to be balanced in that the v j levels appear equally often in the jth factor, j ¼ 1, :::, m: The set of all such balanced designs is more commonly known as U-type designs (Fang and Hickernell 1995).Two columns are orthogonal if all of their level combinations appear equally often.Achieving pairwise orthogonality between any two columns will be ideal, but that is not possible because the design is supersaturated.Therefore, similar to two-level designs, optimality criteria aim to maximize the average pairwise orthogonality.Another important consideration while constructing SSDs is that no two columns should be completely aliased, where two columns are fully aliased if one can be obtained from another by level permutations.This condition is difficult to validate for designs with more than two levels.The multi-and mixed-level optimal designs available in the literature and used in this article are level-balanced, and no two columns are fully aliased.

v 2 -optimality
A level-balanced design is v 2 -optimal (Yamada et al. 1999;Yamada and Lin 1999) if it minimizes where n ðijÞ uv is the number of (u, v)-pairs in ith and jth factors in design and n v i v j is the average frequency of level-combinations in each pair ith and jth factor.

Eðf NOD Þ-optimality
A level-balanced design is Eðf NOD Þ-optimal (Fang, Lin, and Ma 2000) if it minimizes where f ij NOD is a non-orthogonality measure and other notations are as defined above.The subscript "NOD" in the criterion indicates that it is a non-orthogonality measure.Note that for multi-level designs, an Eðf NOD Þ-optimal design is also v 2 -optimal.Between v 2 -and Eðf NOD Þ-optimality, the latter is more popular since Eðf NOD Þ-optimal designs minimize non-orthogonality and maximize space-filling uniformity (Fang, Lin, and Liu 2003).Most optimal designs in the SSD literature and hence the ones used in this article are Eðf NOD Þ-optimal.Construction methods are proposed in Yamada et al. (1999), Fang, Lin, andMa (2000), Fang, Ge, and Liu (2002b), Fang, Lin, and Liu 2003), Lu, Hu, and Zheng (2003), Fang et al. (2004a), S. Georgiou and Koukouvinos (2006) and Sun, Lin, and Liu (2011).For a more comprehensive review of SSDs, we refer to S. D. Georgiou (2014) and references therein.

GMA-optimality
A level-balanced design is GMA-optimal (Xu and Wu 2005) if it sequentially minimizes the generalized wordlength pattern A 1 ðdÞ, A 2 ðdÞ, A 3 ðdÞ, … , where for a design d, and x ðjÞ ik is the entry in the ith row and kth column of X j and X j is the matrix of orthonormal contrasts for the all j-factor interactions.For balanced SSDs, A 1 ¼ 0, therefore, the GMA criterion minimizes A 2 , and then A 3 , A 4 , and so on.The A 2 measures the overall aliasing between all pairs of columns, and Xu and Wu (2005) showed that optimizing A 2 ðdÞ is equivalent to optimizing aveðv 2 Þ and Eðf NOD Þ for multi-level designs and is equivalent to optimizing aveðv 2 Þ for mixedlevel designs.It is worth noting that the GMA-criterion is invariant to the choice of orthonormal contrasts (Wu and Xu 2001).Xu and Wu (2005) constructed several classes of multi-level SSDs, which minimizes the maximum aliasing between columns in addition to minimizing the overall aliasing between columns via Eq.[2].Therefore, GMA-optimal designs can be considered superior to aveðv 2 Þand Eðf NOD Þ-optimal designs.We refer the reader to Xu and Wu (2005) for details.Several constructions have been proposed for GMA-optimal designs (for example, Fang et al. 2004b;Xu and Wu 2005).

Analysis methods
In this section, we discuss our implementation strategies for the following four analysis methods:

LASSO
For a centered response vector and Z, the LASSO (Tibshirani 1996) estimate is the solution to where Z is a column-centered and column-normalized version of X, and d L is a tuning parameter.The LASSO estimate in 3) is equivalently (James, Radchenko, and Lv (2009) and more commonly written as the solution to glmnet (Friedman, Hastie, and Tibshirani 2010) to solve Eq. [3].The best value of the tuning parameter d L is typically chosen using cross-validation or a model selection criterion.The estimate obtained from Eq. [3] for the selected value of d L is known to have far too many active columns (Meinshausen 2007;Roberts and Nowak 2014).We further regularize a LASSO estimate on lines similar to that of GDS.For each choice of d L , we divide the absolute values of estimates into two clusters using kmeans following Singh and Stufken (2022).Effects in the cluster with the largest mean are declared active effects for that choice of d L .A BIC value is computed by running an ordinary least squares (OLS) on the selected active effects for each choice of d L .From a set of multiple models, each corresponding to a specific choice of d L , the model exhibiting the lowest Bayesian Information Criterion (BIC) value is selected.The effects in the model with the smallest BIC are declared active.The factors that have at least one active constituent column are then declared active.

Gauss-Dantzig Selector
For a centered response vector and Z, the Dantzig Selector (Cand� es and Tao 2007) estimate is the solution to where jjbjj 1 ¼ maxðjb 1 j, :::, jb p jÞ is the l 1 norm, and d D is a tuning parameter.We use our wrapper to the R package lpSolve (Berkelaar et al. 2023)

Smoothly clipped absolute deviation (SCAD)
For a centered response vector and Z, the SCAD (Fan and Li 2001) estimate is the minimizer of where the first-order derivative of the penalty function is given by p , for some a > 2, h > 0, and Ið:Þ is the indicator function.Here, the expression ðaÞ þ is equal to a if a � 0 and 0 otherwise.We use the ncvreg() function in the R package ncvreg (Breheny and Huang 2011) with the default settings to solve Eq. [5].To maintain consistency with other methods compared, we use the clustering method of Singh and Stufken (2022) and BIC on the corresponding OLS estimates to automatically tune k S .

Group LASSO
For a centered response vector, centered Z, and orthonormalized Z j for each j, the group LASSO (Yuan and Lin 2006) ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ðv j − 1Þ q jjb j jj 2 : [6] The penalty of ðv j − 1Þ can be chosen differently, but we use the original penalty suggested in Yuan and Lin (2006).We use the R package gglasso (Yang, Zou, and Bhatnagar 2020) to solve Eq. [6].By way of its penalty, the group LASSO estimate either declares all variables in a group to be active or all variables to be inactive.We select the tuning parameter d GL by applying kmeans on the resulting group LASSO estimate and then selecting the model corresponding to the smallest BIC.In general, kmeans might regularize the extra coefficients, for example, if one column corresponding to a factor has a very small absolute coefficient value compared to others.Therefore, to maintain parity with the original intention of group LASSO, we first select active effects using kmeans and then put all columns corresponding to these active effects in the resulting linear model before computing BIC.The BIC of these models is then compared and used to find the best value of d GL .In addition, note that the group LASSO solution is invariant to different ways of orthonormalizing Z (Yuan and Lin 2006).By invariance, we mean that the group LASSO solutions declares same factors as active, regardless of the chosen parameterization.Before applying group LASSO, we transform the parameterized or the coded matrix to a group orthonormal matrix such that each X j is orthonormalized.Thus, by virtue of such a transformation, the group LASSO estimate is invariant to different parameterizations.Such invariant nature is desirable, as we will see in the following section.

Simulation results
We begin by outlining the questions this article addresses, followed by our simulation settings and detailed results.

Objectives
The main objectives of finding optimality criteria and analysis methods yielding a satisfactory screening performance for multi-and mixed-level designs can be accomplished by addressing the following concerns.
i. How should the data arising from screening experiments be analyzed, that is, which method among the ones outlined in Section 2.2 results in a satisfactory performance?ii.Does the choice of parameterization impact the performance?If yes, should it influence the selection of the analysis method?iii.Can a specific design optimality criterion be identified that yields superior performance in comparison to the alternative criteria discussed in Section 2.1?iv.Under what conditions should multi-and mixedlevel supersaturated designs be used, and can the conclusions drawn by Marley and Woods (2010) for two-level designs be extended?

Simulation settings
We consider the following settings in our designed simulation study to address the questions outlined in Section 3.1.
a. Number of factors, runs, and levels of designs.
We consider some three-level, some four-level, and a few mixed-level designs.They are listed in Table 1 for ease of reference.All designs are provided in the Supplementary Material of the article, and their construction sources are mentioned in the subsequent sections.b.Codings or parameterizations.We either use dummy coding or polynomial coding to generate the data.We use dummy, polynomial, or sum coding while analyzing the data.c.Magnitude and signs of active effects.Coefficients of active effects are either generated from N(5, 1) or N(3, 1).Once coefficients are generated, they are either assigned a sign of þ1 or -1 with equal probability (in "Random Signs" scenarios) or assigned a positive sign with probability 1 (in "Positive Signs" scenarios).Thus, we have a total of four kinds of situations.Error is generated from N(0, 1), and inactive effects have 0 magnitude.d.Frequency and types of active factors.The number of active factors are considered to be either bn=4c, bn=3c, bn=2c, or b2n=3c: At the same time, we consider four types of responses, three where We evaluate different designs and analysis methods by varying some of the features listed in (a)-(d).Active factors are identified from a given analysis method for a given design and scenario.These identified active factors are then used to calculate power and type 1 error.The results in the following section report power and type 1 error for each situation averaged over 10,000 iterations.

Results
The results are divided into three subsections, each of which addresses the questions posed in Section 3.1.First subsection answers questions (i) and (ii), whereas the remaining questions are answered in subsequent subsections.

Comparisons of different analysis methods and reparameterizations
In this section, we analyze supersaturated designs using the four methods outlined in Section 2.2, LASSO, GDS, SCAD, and group LASSO.We do so for multiple multi-and mixed-level designs, response types, and different codings.We will end with a conclusion that group LASSO should be preferred to analyze multi-and mixed-level designs because (a) group-wise screening is more natural for factors with more than two-levels, (b) group LASSO has a similar screening performance to that of LASSO and GDS, and (c) group LASSO is invariant to different analysis codings.
We use six different optimal designs in this subsection, ð9, 8, 3 8 Þ, ð12, 11, 3 11 Þ, ð18, 17, 3 17 Þ, ð8, 7, 4 7 Þ, ð12, 8, 4 8 Þ, and ð16, 10, 4 10 Þ: The first two three-level designs are Eðf NOD Þand v 2 -optimal, whereas the design with 18 runs is GMA-optimal.They are, respectively, from Fang, Ge, and Liu (2004) (labeled LKTS), Lu, Hu, and Zheng (2003) (labeled LHZ), and Fang et al. (2003) (labeled GMA).For the four-level designs, the design ð8, 7, 4 7 Þ is Eðf NOD Þand v 2 -optimal from Lu, Hu, and Zheng (2003) (labeled LHZ).The ð12, 8, 4 8 Þ design is Eðf NOD Þ-optimal from https://drs.icar.gov.in/Supersaturated_Design/SSD/Supersaturated.html, whereas the ð16, 10, 4 10 Þ design is GMA-optimal from Fang et al. (2004b) (labeled GMA).Throughout this section, we first convert the design to a polynomially-coded model matrix and then center and scale each column of the transformed matrix to length one.This new matrix is then used to generate an appropriate response.For f active factors, the response labeled Y1 is generated such that only the first column (linear effect) per active factor is active.Similarly, Y2 corresponds to only the second column (quadratic effect) being active, whereas Y3 corresponds to all columns corresponding to an active factor being active.We consider a few three-level designs in Figure 1 and fourlevel designs in 2. The first, second, and third columns in Figures 1 and 2 correspond to Y1, Y2, and Y3, respectively.The number of active factors f is bn=3c: Note that this corresponds to bn=3c columns being active for the first and second columns in figures, but 2bn=3c and 3bn=3c columns being active for Figures 1  and 2, respectively.The magnitudes of active columns follow N(5, 1) with signs chosen randomly from the set f−1, 1g, and the error follows N(0, 1).
Across different situations in Figures 1 and 2, LASSO seems to have the best power, closely followed by GDS, followed by group LASSO and SCAD (named "SC" in figures).However, the ranking seems to be group LASSO, LASSO, GDS, and SCAD when the type 1 error is compared, with group LASSO being the best.Secondly, in Figure 1, the sum coding seems a worse way to analyze the data for Y1, whereas the dummy and the polynomial coding seem worse for Y2 and Y3 in the second and third columns.Similarly, in Figure 2, the sum coding seems worse for Y1 and Y2, and the polynomial coding seems worse for Y3.Also, note that the screening performance of LASSO, GDS, and SCAD differ with different analysis codings.The finding that different codings yield different screening performances for LASSO, GDS, and SCAD can also be seen in additional simulations shown in the Supplementary Material.The results in the Supplementary Material correspond to different designs, different numbers of active effects, the magnitude of active effects from N(3, 1), and positive signs for active coefficients.Different analysis codings for the same data, resulting in distinct screening performances, present a problem.One way to deal with this problem is to find a coding that yields the best screening performance across different response types and a different number of active effects.A quick scan of results in the two figures and the ones in the Supplementary Material show that no such analysis method and coding combination is the best.Therefore, we resort to the next best alternative of using an analysis method invariant to reparameterizations.As mentioned in 2.2 and observed in the figures, group LASSO is invariant to the coding used to analyze data. 1 and 2 that the sum coding should never be used to analyze data.It is also clear that dummy coding sometimes gives poor results (such as the middle column in both figures).However, the polynomial coding with either LASSO or GDS still performs better than group LASSO.We now assess if polynomial coding remains reasonable when the data-generating mechanism differs.Figure 3 shows the screening performances when data are generated using dummy coding; that is, a design is converted to a dummy-coded matrix before centering and scaling each column to length one.The left and right panel of Figure 3, respectively, corresponds to the designs in Figures 1 and 2. Figure 3 also uses f ¼ bn=3c active factors with coefficients of active effects from N(5, 1), signs chosen randomly from the set f−1, 1g and the error following N(0, 1). Figure 3 depicts that neither polynomial nor sum coding is good.However, dummy coding with LASSO or GDS yields good screening performance.LASSO and GDS yield the best results when the data generation and analysis coding are the same.However, in practice, the knowledge of the coding used for data generation is not available.Therefore, we conclude that group LASSO is a reasonable analysis method for analyzing multi-and mixed-level designs.

Comparisons of different designs
The next question we answer is whether a particular design criterion is better among those discussed in Section 2.1.Note that the class of designs considered by any criteria discussed in Section 2.1 only contains balanced designs.Levels occur equally often for each factor in balanced designs.The balance restriction and the desire for orthogonality between any pair of columns make constructing the optimal supersaturated designs challenging.The factor-level combinations for a design impose restrictions on the number of runs.Therefore, such optimal designs are sparingly available.We show the performance of the selected designs in Figures 4 and 5.For the rest of the article, we generate data through polynomial coding and use group LASSO with polynomial coding to analyze the same.
Figure 4 shows the screening performance of three and four different three-level designs with ðn, mÞ ¼ ð9, 8Þ and (9, 12), respectively.The first row corresponds to (9,8), whereas the second corresponds to designs with (9,12).The first panel of columns uses random signs for coefficients for data generation, and the second uses positive signs.The FLM, LKTS, and YIHN designs are obtained, respectively, from Fang, Lin, and Ma (2000), Fang, Ge, andLiu (2004), andYamada et al. (1999).Each of them are Eðf NOD Þ-optimal, have the same value for maxf ij NOD , but the combinations with maxf ij NOD are 4, 4, and 6, respectively (Fang, Ge, and Liu 2004).For nine runs and twelve factors, we additionally consider a GMAoptimal design of Xu and Wu (2005).Each of the  FLM, LKTS, and YIHN for n ¼ 9 and m ¼ 12 are Eðf NOD Þ-optimal, have the same value for maxf ij NOD , but the combinations with maxf ij NOD are 12, 12, and 14, respectively (Fang, Ge, and Liu 2004).All designs considered in Figure 4 have similar screening performance (except the FLM design in the first panel) when random signs are chosen.However, the performance differs significantly when all signs for coefficients are positive, with GMA-optimal and YIHN designs having the best screening performance.
In Figure 5, we consider two four-level designs for each ðn, mÞ ¼ ð8, 7Þ and (12, 11).Like Figure 4, the first column panel uses random signs for coefficients, and the second one uses positive signs.The LHZ design from Lu, Hu, and Zheng (2003) are Eðf NOD Þ-optimal.The FGLQ design from Fang et al. (2004b) is GMA-optimal, whereas the FGL design for seven-factor design is Eðf NOD Þ-optimal and from Fang, Ge, and Liu (2002a).Figure 5 show that the performances of the two designs remain the same when random signs are used.However, when coefficients are positive, the LHZ design performs the best among the 7-factors design, whereas the GMA-optimal design performs the best for the 11-factor design.Overall, the GMA-optimal designs, wherever available, have the best screening performance when effect signs are assumed to be positive.For random effect signs, all designs have a similar screening performance.The latter observation that different designs have equivalent screening performance when effect signs are randomly chosen is similar to that of the corresponding finding for two-level designs (Singh and Stufken 2023;Weese et al. 2021).
As mentioned, most of the multi-and mixed-level optimal designs available in the literature are balanced.The designs with the best screening performance proposed in Weese et al. (2021) and Singh and Stufken (2023) are often unbalanced for two-level factors.The unbalanced designs are appealing because they can be easily constructed using algorithms for any run, factor, and level combination.But do these unbalanced designs have an equivalent or better performance for multi-and mixed-level designs?How should such designs be constructed?While this is a subject of future investigation, we now provide the performance of a few unbalanced designs for three-level designs with nine runs and twelve factors in Figure 6.These designs are constructed by deleting the last column and 18 runs from a strength two orthogonal array (Hedayat, Sloane, and Stufken 1999) of 27 runs and 13 factors obtained from http://neilsloane.com/oadir/oa.27.13.3.2.txt.The selected designs are the ones that optimize the two criteria discussed in Singh and Stufken (2023) for two-level designs.Note that this is just an exploratory example, and further research is needed to find a suitable criterion for optimal unbalanced multi-and mixed-level designs.Figure 6 indicates that some unbalanced designs can perform better than others and that the unbalanced designs can perform just as well as balanced designs.Therefore, good non-level balanced designs can be potentially useful.

Comparisons to find suitable combinations of n, m, and f
We now compare the designs identified as the best designs in the previous section.Figure 7 compares the screening performance of three-level and four-level designs in the left and right panels, respectively, whereas Figure 8 does the same for mixed-level designs.The two mixed-level designs for n ¼ 8 are from https://drs.icar.gov.in/Supersaturated_Design/SSD/Supersaturated.html with Eðf NOD Þ efficiency of 0.87 and 1, respectively.The Eðf NOD Þ-optimal mixedlevel designs for n ¼ 12 are from Liu and Liu (2012).
First, observe that the screening performance deteriorates as the number of active factors increases and m increases.For the left panel, the ratio of m to n is 0.9, 1.3, 1.8, and 2.2, whereas, for the right panel, the ratios are 0.6, 0.9, and 1.9.However, the ratio of m to n does not tell us much in the case of multi-and mixed-level designs.For example, n ¼ 9, m ¼ 8 3-level design and n ¼ 16, m ¼ 10 4-level design have m=n ¼ 0:9, but the former has far better screening performance than the latter.The ratio for the number of columns to runs in the left panel is 1.8, 2.7, 3.6, and 4.4; similarly, the right panel has ratios equal to 1.9, 2.8, and 5.6.We see that the smaller this ratio is, the better the screening performance of corresponding designs.In Figure 8, the ratio is 1.9 and 3.5 for 8-run designs and 1.3 and 1.9 for 12-run designs.We see that a smaller ratio is indicative of good performance.Based on results in Figures 7 and 8, and additional results in the Supplementary Material, we arise at a similar conclusion as Marley and Woods (2010) for two-level designs.The ratio for the number of columns to runs should be at most 2 for group LASSO   to perform satisfactorily.In other words, the number of runs in a given design should be at least half the total number of columns to be screened.
The group LASSO declares all corresponding columns of each group (or a factor) to be either active or inactive.Therefore, a suitable definition of sparsity should be formulated regarding the number of active groups or factors, f.Figures 7 and 8 indicate that irrespective of the design or the ratio between the number of columns to runs, the screening performance is excellent, satisfactory, and poor when the number of active groups is bn=4c, bn=3c, and more than bn=4c, respectively.Therefore, fewer active factors, with a cap at bn=3c, would result in an exemplary screening performance.

Concluding remarks
Screening experiments help provide a preliminary understanding of a complicated business process involving multiple factors.SSDs are typically capable of meeting that goal and are employed in the first stage of screening experiments.Often, practitioners like to study the effect of at least some factors at more than two levels, indicating the need for multi-and mixed-level supersaturated designs.The literature has plenty of optimal designs, best analysis methods, and prescriptive guidance for using SSDs when all factors are at two levels.Different optimality criteria and corresponding constructions exist for multi-and mixed-level SSDs.However, a detailed investigation of analysis methods and the conditions for using SSDs with reasonable confidence will further enrich the literature.This article addresses these gaps by strengthening our understanding of the appropriate use of multi-and mixed-level SSDs.
The first set of analysis results shows that different parameterizations can yield very different results depending on the choice of coding used to generate the data and analysis methods.Only group LASSO is invariant to different reparameterizations by its design.The group LASSO performs at par with other analysis methods, especially when compared across scenarios, codings, and designs.Furthermore, it should be noted that the presence of positive coefficients associated with active effects in polynomial coding does not inherently imply that the corresponding coefficients in sum or dummy coding will also be positive, and vice versa.Consequently, it is advisable to employ a coding-invariant analysis method when dealing with multi-and mixed-level designs.For all these reasons, we propose using group LASSO to analyze multi-and mixed-level SSDs since neither the data in practice is generated using a particular coding nor a single choice of parameterization is best.We make the following observations: extending Marley and Woods (2010) observations for two-level designs and using group LASSO for analysis.
1. Designs for unknown signs: Different optimality criteria result in designs with equivalent screening performance when the effect directions are unknown (or the signs of coefficients are randomly generated).This observation is similar to that of the two-level designs by Marley and Woods (2010) and Weese, Smucker, and Edwards (2015).2. Designs for known signs: When the effect directions are known (or all signs of coefficients are positive), the GMA-optimal designs seem to perform the best.Future investigations are required to understand the reasons behind the superior performance of GMA-optimal designs.Such results also have precedence in the two-level SSD literature.For example, designs by Weese et al. (2021) and Singh and Stufken (2023) perform better when effect directions are known in advance.Knowing the signs in advance for our setup is more challenging, especially for ordinal categorical predictors.However, since designs that work well for positive signs also work at par for random signs, investigations for new design criteria that utilize the sign information might be in order.3. Non-level-balanced designs The optimal multiand mixed-level SSDs available in the literature are all level-balanced.However, Figure 6 shows that some non-level-balanced designs can perform just as well as balanced designs.Further investigations to find such designs and the corresponding optimality criteria will extend the current restricted class of designs implying the general availability of designs for any n, m, and v i 's.

Ratio of the number of active factors to runs:
SSDs show extremely poor performance if more than bn=3c factors are active.Therefore, the ratio of active factors to runs should be as small as possible but not exceed b1=3c: This finding extends the result in Marley and Woods (2010), where the same ratio was prescribed for a twolevel SSD to have an effective performance.Note that our ratio for group LASSO is defined in terms of the number of active factors and not in terms of the number of columns, owing to the consideration of the inherent group structure.For two-level designs, the number of active columns is the same as the number of active factors since each group has only one column.5. Ratio of the number of columns to runs: SSDs perform well if the number of runs is at least half the total number of columns to be evaluated.This finding is also in parity with the corresponding finding of Marley and Woods (2010), except that now the ratio of the number of columns to run is a defining criterion.The result is intuitive because any method should have more difficulty in identifying active factors in designs with 11 six-level factors than a design with 11 three-level factors in 18 runs (say).
A consequence of Eq. [3] is that the practitioners would benefit from better design criteria and corresponding optimal designs that do not impose the restriction of balance, in particular, because such designs can be readily constructed by using algorithms for any given factor-level-run combination.Several optimal SSDs with a large m and less than half the number of runs have been constructed in the literature.However, from Eq. [5], such designs seem to have merely mathematical benefits.Moreover, an engineer might be interested in investigating the effects of a large number of effects in a relatively small number of runs.These designs could be utilized if a better analysis method that preserves different parameterizations could be introduced.Future works will investigate these two directions.

Figure 1 .
Figure 1.Comparison of different analysis methods for three-level designs with different response types on columns and different designs in rows.Data are generated using polynomial coding, and bn=3c active factors with magnitude from N(5, 1) and random signs.We have different analysis methods on the x-axis, and different colors represent codings used to analyze the data, polynomial (black), dummy (blue), and sum coding (cyan).Dotted lines represent type 1 error, whereas the solid lines correspond to power."SC" ¼ SCAD.

Figure 2 .
Figure 2. Comparison of different analysis methods for four-level designs with different response types on columns and different designs in rows.Data are generated using polynomial coding, and bn=3c active factors with magnitude from N(5, 1) and random signs.We have different analysis methods on the x-axis, and different colors represent codings used to analyze the data, polynomial (black), dummy (blue), and sum coding (cyan).Dotted lines represent type 1 error, whereas the solid lines correspond to power."SC" ¼ SCAD.

Figure 3 .
Figure 3.Comparison of different analysis methods for three-level (left panel) and four-level (right panel) designs with different response types on columns and different designs in rows.Data are generated using dummy coding, and bn=3c active factors with magnitude from N(5, 1) and random signs.We have different analysis methods on the x-axis, and different colors represent codings used to analyze the data, polynomial (black), dummy (blue), and sum coding (cyan).Dotted lines represent type 1 error, whereas the solid lines correspond to power."SC" ¼ SCAD.

Figure 4 .
Figure 4. Comparison of different designs for three-level designs with (n, m) ¼ (9,8) (first row) and (9,12) (second row) with different response types and signs on columns.Data are generated using polynomial coding, and active factors with magnitude from N(5, 1) and random signs (in first and second columns) and positive signs (in third and fourth columns).Data are analyzed through group LASSO using polynomial coding, and different colors represent different designs (explained in text).Dotted lines represent type 1 error, whereas the solid lines correspond to power.

Figure 5 .
Figure 5.Comparison of different designs for four-level designs with (n, m) ¼ (8,7) (first row) and (12,11) (second row) with different response types and signs on columns.Data are generated using polynomial coding, and active factors with magnitude from N(5, 1) and random signs (in the first vertical panel) and positive signs (in the second vertical panel).Data are analyzed through group LASSO using polynomial coding, and different colors represent different designs (explained in text).Dotted lines represent type 1 error, whereas the solid lines correspond to power.

Figure 6 .
Figure 6.Comparison of unbalanced three-level designs for 9 runs and 12 factors with different responses and sign types on columns.Data are generated using polynomial coding, and active factors with magnitude from N(5, 1) and random signs (in left panel) and positive signs (in right panel).Data are analyzed through group LASSO using polynomial coding, and different colors represent different designs (explained in text).Dotted lines represent type 1 error, whereas the solid lines correspond to power.

Figure 7 .
Figure 7.Comparison of three-level and four-level designs for Y3 responses and sign types on columns.Data are generated using polynomial coding, and active factors with magnitude from N(5, 1) and random signs (in the first vertical panel) and positive sign columns (in the second vertical panel).Data are analyzed through group LASSO using polynomial coding, and different colors represent different designs (explained in text).Dotted lines represent type 1 error, whereas the solid lines correspond to power.

Figure 8 .
Figure 8.Comparison of four mixed-level designs for Y3 responses and sign types on columns.Data are generated using polynomial coding, and active factors with magnitude from N(5, 1) and random signs (in the first vertical panel) and positive sign columns (in the second vertical panel).Data are analyzed through group LASSO using polynomial coding, and different colors represent different designs (explained in text).Dotted lines represent type 1 error, whereas the solid lines correspond to power.

Table 1 .
Designs considered in the simulation study.Þ, and ð12, 12, 2 2 3 9 4 1 Þ one column of certain type is active for each active factor, and another where all columns of active factors are active.