A minimal switching procedure for constrained ranking and selection under independent or common random numbers

Constrained Ranking and Selection (R&S) aims to select the best system according to a primary performance measure, while also satisfying constraints on secondary performance measures. Several procedures have been proposed for constrained R&S, but these procedures seek to minimize the number of samples required to choose the best constrained system without taking into account the setup costs incurred when switching between systems. We introduce a new procedure that minimizes the number of such switches, while still making a valid selection of the best constrained system. Analytical and experimental results show that the procedure is valid for independent systems and efficient in terms of total cost (incorporating both switching and sampling costs). We also inspect the use of the Common Random Numbers (CRN) approach to improve the efficiency of our new procedure. When implementing CRN, we see a significant decrease in the samples needed to identify the best constrained system, but this is sometimes achieved at the expense of a valid Probability of Correct Selection (PCS) due to the comparison of systems with an unequal number of samples. We propose four variance estimate modifications and show that their use within our new procedure provides good PCS under CRN at the cost of some additional observations.


Introduction
Ranking and Selection (R&S) procedures are statistical tools for selecting the best system out of a finite number of simulated alternatives. Commonly, the best system is the one with the highest or lowest mean performance measure. Since outputs from each system are stochastic and possibly expensive, decision makers need to be concerned with both computational efficiency and validity, as expressed by the probability of correctly selecting the best system. Many approaches exist to address this general problem, either through the indifference-zone framework (e.g., Rinott (1978); Kim and Nelson (2006)) the Bayesian framework (e.g., Chick (2006) and Frazier (2012)), or the Optimal Computing Budget Allocation (OCBA) framework (e.g., Chen et al. (2000)).
We consider a more complicated form of R&S, namely, selecting the best system that satisfies constraints on one or more secondary performance measures. This problem is known as constrained R&S. To accomplish this, we adopt the framework of Andradóttir and Kim (2010), which involves a feasibility check phase to ensure that the chosen system meets the required constraints and a comparison phase to determine the best feasible system. Within this framework, we seek to ensure a desired Probability of Correct Selection (PCS) of the best feasible system.
The problem of constrained R&S has attracted some attention lately, including the development of the fully sequential indifference-zone procedures of Andradóttir and Kim (2010) and Healey et al. (2013Healey et al. ( , 2014, the OCBA methods of Pujowidianto et al. (2009) and Lee et al. (2012), the multiple attribute theory of Morrice and Butler (2006), an indifference-zone approach by Kabirian andÓlafsson (2009), which consider the probability that several stochastic constraints are feasible, and asymptotic rules for sampling within a budget by Pasupathy (2010, 2013). Butler et al. (2001), Lee et al. (2006), Chen and Lee (2009), Lee et al. (2010), and Teng et al. (2010) ad-0740-817X C 2015 "IIE" Minimal switching procedure for constrained R&S 1171 dress multiple objective problems in general. In addition, some research has been dedicated soley to the feasibility check, as in Szechtman and Yücesan (2008) and Batur and Kim (2010).
The previously mentioned procedures for constrained R&S aim for efficiency in terms of observations required to find the best feasible system, but there are no procedures that we know of that explicitly address the cost of switching between systems in this context. Though it is common to compare procedures based on the required number of samples to achieve a nominal PCS, the possibly high cost (in both time and storage) of stopping and restarting complex simulations should also be considered. Hong and Nelson (2005) and Osogami (2009) present two fully sequential procedures that perform valid comparisons while limiting the number of switches (for a pure R&S problem).
As pointed out by Hong and Nelson (2005) and Osogami (2009), fully sequential R&S procedures, which need a lot of switching among systems such as the KN procedure of Kim and Nelson (2001), may become inefficient if the penalties for switching are large. More specifically, Hong and Nelson (2005) state that switching from one simulated system to another usually includes storing state information about the current system (including values of the random number seeds); saving all relevant output data; swapping the executable code for the current system out of, and the code for the next system into, active memory; and restoring the state information for the next system to the values it had on the last call.
They test the range of relative cost between switches and samples from one to 1000 and point out that even a 1000 times more expensive switching cost than sampling is not excessive. These costs would also be incurred by any constrained R&S procedure utilizing similar fully sequential algorithms for comparison. In this article, we present a new fully sequential indifference-zone procedure, named the Constrained Minimal Switching (CMS) procedure, designed to reduce the total (sampling and switching) cost when switching costs are significant, while identifying the best feasible system.
In recent years, parallel computing environments have become more readily available. Simulating systems in parallel could reduce switching costs; however, this is not always advisable as parallel computing involves its own complexities (including coordination among processors). Even when parallel computing is available, the number of processors for a typical individual PC or laptop is limited and thus one processor needs to run simulations of multiple systems when comparing hundreds or thousands of systems, requiring switching. Our research would facilitate minimizing the total cost of each of these processors. Research on an efficient way of distributing systems among processors would be useful, but is outside the scope of this article.
Minimal switching procedures reduce the cost of stopping and restarting simulations but often require extra samples to ensure that the number of switches in the comparison phase does not exceed the number of systems. We investigate the use of Common Random Numbers (CRN), a variance reduction technique, to reduce the number of required samples for CMS. Healey et al. (2014) study the use of CRN in constrained R&S. They prove the validity of two procedures that always compare systems with equal sample sizes but express concerns about the validity of comparing systems with unequal sample sizes under CRN. We provide experimental results that show PCS can be significantly degraded under high correlation. As unequal sample sizes commonly occur within our new minimal switching procedure (and other procedures), we present four variance estimate modifications and show that their use within CMS under CRN captures savings in the required number of observations until a decision is made, while still providing a good PCS.
This article is organized as follows. Section 2 outlines the problem of constrained R&S, details notation, and sets assumptions for the validity of our procedure. Section 3 introduces the CMS procedure and includes a proof of its validity for independently simulated systems. In Section 4, we discuss the motivations for the use of CRN, discuss its effects within CMS, and propose modifications to address its challenges. Section 5 features experimental results, followed by conclusions in Section 6. An earlier version of this article was presented at the Winter Simulation Conference .

Background
The goal of constrained R&S is the selection of the best system based on a primary performance measure out of a fixed number of alternatives, k, with constraints on s secondary performance measures. In this section, we outline the problem, while also introducing notation and assumptions needed for our procedure and its validity proof.
Let (X in , Y i 1n , . . . , Y isn ) be the nth observation of the i th system for the primary performance measure and s secondary performance measures. The set of all possible sys- be the mean values of the primary and secondary performance measures for each system i ∈ S and constraint = 1, 2, . . . , s. Therefore, our objective is to determine which system has the best primary performance measure, while also satisfying all constraints: s.t. y i ≤ q for all = 1, 2, . . . , s.
We let σ 2 x i = Var[X in ] for all i and σ 2 y i = Var[Y i n ] for all i and . Moreover, the relationship between performance measures is governed by the following assumption.

Assumption 1. For each
where iid ∼ denotes independent and identically distributed, MN denotes multivariate normal, and i is the (s + 1) × (s + 1) covariance matrix of the vector (X in , Y i 1n , . . . , Y isn ).
The normality of data is a common assumption within R&S, achieved through within-replication averages, including batched means in steady-state simulation (Law and Kelton, 2000). When the basic observations X in and Y i n are non-normal, observations from different replications may be batched, which would of course affect the sampling cost. Furthermore, data points can be correlated across systems (due to CRN) and across performance measures.
The procedure detailed in this article utilizes the indifference-zone method for both the feasibility check and comparison phases. For all systems involved in the simulation, we designate the indifference-zone parameter, δ > 0, as the smallest significant difference among systems' primary performance measures. Thus, we are indifferent between systems that have means within δ of each other, as long as the systems have not been found infeasible.
Likewise, we consider the tolerance level > 0 to be the smallest significant difference between y i and q . Therefore, we can place all systems into three sets in terms of feasibility. If system i is in S D , the set of desirable systems, then y i ≤ q − for all = 1, 2, . . . , s. S U is the set of unacceptable systems where at least one secondary performance measure, y i , is infeasible, so that y i > q + . All systems not in S D or S U fall into S A , the set of acceptable systems.

Assumption 2. For all
is the index of the best desirable system.
Under Assumption 2, we let CS be the correct selection event that system [b] is declared feasible and all systems in S \ {[b]} are eliminated. If all systems are unacceptable, then CS is the event that all systems in S are eliminated. Finally, if S D = ∅ and S A = ∅, then CS occurs both when all systems are eliminated and also when any system in S A is selected. We desire to ensure a nominal PCS of at least 1 − α.
We need some additional notation: n 0 = the first-stage sample size; S 2 X i j = the sample variance of the paired difference of {X i 1 , X i 2 , . . . , X in 0 } and {X j 1 , X j 2 , . . . , X jn 0 }; S 2 X i = the sample variance of {X i 1 , X i 2 , . . . , X in 0 }; for a, b, d ∈ R + and a = 0; CS i = the event that system i is eliminated in pairwise comparison of systems i and [b], for any i ∈ S with x [b] ≥ x i + δ; C D i = the event that correct decision is made on the feasibility of system i ∈ S (when i ∈ S A , any feasibility decision is C D i ); β 1 = the error of an individual feasibility check for one performance measure of one system; β 2 = the error of a pairwise comparison between two systems.
Remark 1. Note that R(r ; a, b, d) involves a decreasing linear function of the number of observations r , and originates from Fabian (1974). Our procedures require a so-called continuation region, defined as (−R(r ; a, b, d), R(r ; a, b, d)). Sampling continues when an observed statistic stays within the region and an elimination occurs when the statistic exits the region.
With this notation, we now present assumptions that govern good feasibility check and comparison phases. Assumptions 3 and 4 ensure that feasibility check and comparison are handled in a valid manner when systems are independently simulated. Assumptions 5 and 6 similarly ensure a valid feasibility check and comparison under CRN. Therefore, Assumptions 3 to 6 will not all be required simultaneously.
Assumption 3. The feasibility check phase guarantees Pr{∩ i ∈S C D i } ≥ (1 − sβ 1 ) t for any 1 ≤ t ≤ k and any subset S ⊆ S with cardinality t (i.e., |S | = t) under s constraints.

Assumption 4. The comparison phase guarantees
Assumption 5. The feasibility check phase guarantees Pr{∩ i ∈S C D i } ≥ (1 − tsβ 1 ) for any 1 ≤ t ≤ k and any subset S ⊆ S with cardinality t under s constraints.

Assumption 6. The comparison phase guarantees
Assumptions 3 to 6 can be verified for a certain class of procedures in which (i) the feasibility check of each constraint for each system only depends on parameters of the constraint of the system and (ii) a comparison is performed pairwise with parameters dependent only on the two systems involved. We refer to Hong and Nelson (2005), Pichitlamken et al. (2006), Osogami (2009), and Batur and Kim (2010) for results that show how these assumptions may be verified for particular procedures when Assumption 1 is true.
For constrained R&S in the presence of multiple stochastic constraints, we consider three procedures as competitors to our new procedures: HAK, HAK+, and MD R . Healey et al. (2014) present HAK as a sequentially running procedure and HAK+ as a simultaneously running procedure. More specifically, HAK performs a complete feasibility check of all systems and then a comparison on the systems found feasible. This procedure is most efficient when the feasibility check is quick to finish. HAK+ performs both a feasibility check and comparison on all systems remaining in contention after each stage of sampling. Healey et al. (2013) propose another simultaneously running procedure, MD R . It is an improved version of HAK+ where a system is allowed to go dormant, halting sampling, if it does not seem promising. These three procedures are compared to our proposed procedures in the experimental study.

The CMS procedure
In this section, we present a new method for constrained R&S, which we call the CMS procedure. It minimizes the cost of switching from one system to another, a cost that is often not factored into R&S studies but can comprise a large portion of the computation time.
Our new constrained R&S approach will utilize existing fully sequential feasibility check and comparison procedures but requires additional steps to ensure minimal switching.
The outline of this section is as follows. We first describe our approach in general terms in Section 3.1 and provide results that are useful for proving the statistical validity of our proposed procedures. Then we present our CMS approach in detail and prove its validity in Section 3.2.

General approach
Our minimal switching procedure for constrained selection consists of two steps, finding an incumbent best feasible system and then eliminating systems by comparing each remaining, available system with the incumbent best until one system remains (while also performing feasibility checks on the available systems). It resembles a procedure by Hong and Nelson (2005), but our procedure is designed to solve constrained R&S problems. We now outline the general approach.

General Constrained Switching Approach
Setup and Initialization: Set necessary parameters and take first-stage samples.
Finding a Feasible System: Sort systems based on primary performance measure sample means. Perform feasibility check on systems according to their sorted order to find the initial guess for the best feasible system (B). Once a feasible system is found, take additional samples (if necessary) for the system to be able to compare it against all remaining systems and go to Feasibility and Comparison of A with B.

Feasibility and Comparison of A with B:
Find the next best available system (A). While sampling only from system A, perform both feasibility check and comparison against B. If A is feasible and superior to B, then eliminate system B, take additional samples (if necessary) from A to be able to compare it to all remaining systems, replace B with A, and find the next A. If A is either infeasible or inferior to B, eliminate A and find the next A. This repeats until all systems (except for B) are eliminated.
The procedure will visit each system at most once after the first stage. To achieve this limited number of switches, at least one system must receive a large number of samples, the maximum necessary to complete comparison against all other systems. Therefore, we expect this framework to be conservative in terms of number of observations but a good choice if switching costs are high. Since feasibility check of system A and the comparison of systems A and B occur at the same step, this framework falls within the set of simultaneously running constrained R&S procedures. (A simultaneously running procedure implements both feasibility check and comparison procedures at the same time but eliminates a system only when it is declared infeasible or inferior to a system already declared to be feasible; see Andradóttir and Kim (2010).) To prove the validity of the General Constrained Switching Approach, we note the following two lemmas. Lemma 1 allows us to present the main result of Section 3.2, a proof of the validity of our CMS procedure for independently simulated systems. Lemma 2 provides a method to ensure the validity of simultaneously running procedures under correlated systems. Lemma 1. (Healey et al. (2014, Lemma 4.2)). When the systems are simulated independently and Assumptions 2, 3, and 4 hold, a simultaneously running procedure guarantees Pr{CS} ≥ (1 − sβ 1 ) j + (1 − sβ 1 ) + (1 − β 2 ) k− j −1 − 2 when the number of unacceptable systems j is less than k and Pr{CS} ≥ (1 − sβ 1 ) k when the number of unacceptable systems is equal to k. Lemma 2. (Healey et al. (2014, Lemma 4.6)). When the systems are simulated using CRN and Assumptions 2, 5, and 6 hold, a simultaneously running procedure guarantees Pr{CS} ≥ 1 − ( j + 1)sβ 1 − (k − j − 1)β 2 when the number of unacceptable systems j is less than k and Pr{CS} ≥ 1 − ksβ 1 when the number of unacceptable systems is equal to k.

Detailed procedure
In this section, we present and analyze our CMS procedure for solving constrained R&S problems. We choose to feature fully sequential procedures for the feasibility check and comparison phases in CMS. Fully sequential procedures have been shown to be efficient in many configurations, as the comparison and feasibility check can be re-evaluated after every stage of sampling, possibly with as little as one additional observation.
The feasibility check phase of CMS is performed by the F I B procedure of Batur and Kim (2010), a fully sequential and valid method for determining feasibility of multiple constrained performance measures. Whereas Hong and Nelson (2005) propose two fully sequential R&S procedures that minimize the number of switches, we choose the MSS procedure for implementation of comparison in our CMS, modified as described in their Remark 3. The MSS procedure first takes the maximum number of samples from a potential best system, N B in Equation (1), and compares the potential best with only one alternative by obtaining one sample at a time from the current alternative system. When one of the two systems is eliminated, another system comes into comparison. The F I B procedure identifies a set of feasible or near-feasible systems by utilizing a triangularshaped continuation region defined by R(r ; a, b, d). Note that their function R(r ; a, b, d) for the continuation region has one more parameter c, and our R(r ; a, b, d) is a special case of it when c = 1.

CMS Procedure for Multiple Constraints
Setup: Select the overall confidence level 1/k ≤ 1 − α < 1 and first-stage sample size, n 0 ≥ 2. Choose δ, , and q for = 1, 2, . . . , s. Let Initialization: Let h 2 1 = 2η 1 (n 0 − 1) and h 2 2 = 2η 2 (n 0 − 1). Obtain n 0 observations X in and Y in from each system i ∈ S. For all i and , compute S 2 Y i . Similarly, for all i and j = i , compute S 2 X i j . Let SI i = ∅ be the set of systems inferior to system i ∈ S in terms of the primary performance measure. Let K i = ∅ be the set of constraints found to be feasible for system i ∈ S and let the set of contending systems include all systems, M = S. Set the observation counters r i = n 0 for all i ∈ S.

Finding a Feasible System:
Initial Feasibility Check: Initial Screening for Comparison: Compare all sys- Initial Sorting: Sort the systems in M based on the firststage sample meansX i = (1/n 0 ) n 0 n=1 X in . Let B and A be the systems in M with the best and second-best first stage sample means. Let where · is the ceiling function and max ∅ = −∞. Then N B is the maximum number of samples required for system B to complete comparison with all systems remaining in contention.
Feasibility Check for Best: and A is feasible, then remove B from M. If B / ∈ SI A , Equation (2) Taking N B samples for the current guess for the best feasible system allows the procedure to make statistically valid decisions, while minimizing the number of switches. Each system is sampled at most twice, once for first-stage sampling and sorting and once for feasibility check and comparison. The procedure utilizes only N B samples for comparison, even if more samples are obtained in a long feasibility check. This is desirable because Healey et al. (2014) show that primary performance measure sample means may be biased at the completion of feasibility check if primary and secondary performance measures are correlated, so observations past N B may be harmful.
We now present the main theorem for CMS.
Theorem 1. When the systems are simulated independently and Assumptions 2, 3, and 4 hold, CMS guarantees The proof of Theorem 1 is given in the Online Supplement. Note that for fixed k and α, 2(1 − β 2 ) (k−1)/2 − β 2 − 1 monotonically decreases from one to −2 as β 2 increases from zero to one, guaranteeing a unique solution to the equation in Theorem 1.

CRNs and two-sample comparison
In our new CMS procedure, we require the current best system B to take N B samples, the maximum samples necessary to make a decision against any remaining system. To reduce this large number of observations, we turn to a popular variance technique, namely, the CRN approach, which Nelson and Matejcik (1995), Kim and Nelson (2001), and Healey et al. (2014), among others, show can be used to im-prove the efficiency of both unconstrained and constrained R&S procedures.
Proper implementation of CRN can result in quicker decisions by inducing a positive correlation across systems, which can significantly reduce the value of S 2 X i j , the sample variance of the difference of paired samples from systems i and j . In procedures that compare systems at balanced sample sizes, such as KN of Kim and Nelson (2001), only a simple parameter adjustment to β 2 is needed to make a valid selection under CRN. Unfortunately, we cannot make valid decisions under the CRN method for two-sample procedures that compare systems with unequal sample sizes.
We show how variance estimates in CMS under CRN negatively impact the validity of the procedure in Section 4.1 and propose four modifications to improve the estimates in Section 4.2.

Comparison with positive correlation
Two-sample procedures that estimate variance with S 2 X i j can underestimate the variability of the screening process observed if data points across systems are correlated (due to the use of CRN). In our minimal switching procedure, screening is performed and a decision to continue sampling is based on a function of ((r A − n 0 )/(N B − n 0 )) N B n=n 0 +1 X Bn + n 0 n=1 X Bn , r A n=1 X An , and an estimate of the variance of the difference of these two sums. Note that sample variances S 2 X i j are only computed after the initial n 0 observations from all systems.
If r B = r A = r , then r S 2 X BA is an estimator of where ρ x is defined as the correlation between X Bn and X An .
However, if we assume that r A r B = N B , we create a situation commonly encountered in the CMS procedure. Here, the sum for system B is computed at the random time N B , a sample size many times larger than r A . Thus, as we increment r A : When ρ x is large, the quantity estimated by r A S 2 X BA in Equation (3) could be smaller than the variability observed by the process in Equation (4); see also Levneski et al. (2007) for a similar argument. This underestimation of variability can cause a premature decision when the best system appears to be inferior by chance, hurting PCS. The biased variance estimate creates a continuation region, R(r ; δ, h 2 2 , S 2 X BA ), that is too small to make a valid decision.
We present an empirical study where we compare two systems, separated by the distance of the indifference-zone, δ, with system 1 being the preferable choice. Table 1 shows the observed Pr{CS 2 } of a two-sample comparison under varying correlation ρ x ∈ {0.0, 0.5, 0.6, 0.7, 0.8, 0.9} and initial sample size differences. The two-sample procedure implemented for Table 1 is the S SM procedure of Pichitlamken et al. (2006), the underlying approach for the more efficient version of MSS incorporating Fabian's bound that is implemented within CMS. To simulate two-sample comparison, let r 1 ∈ {20, 30, 45, 70, 120, 200, 300, 500} and r 2 = 20, so that we give system 1 more samples than system 2. Comparison is performed with a nominal confidence level 0.95. Table 1 shows that for correlation, ρ x , greater than 0.5, we can see degradation of Pr{CS 2 } from the independent, balanced-sample case (ρ x = 0 and r 1 = r 2 = 20) as r 1 increases. For ρ x ≥ 0.7, we can no longer expect the Pr{CS 2 } to always meet nominal levels.

Heuristic modifications
We introduce four heuristic modifications to attempt to provide the desired Pr{CS} for two-sample comparisons. Among four procedures we consider, CMS and HAK experience two-sample comparisons, while HAK+ and MD R always compare systems in balanced sample sizes. We test the modifications within the CMS procedure and the HAK procedure of Healey et al. (2014), but the modifications should also prove useful for any general R&S or constrained R&S procedure that utilizes a two-sample comparison.
The approaches require the computation of the firststage marginal sample variances for each system. Recall that for system i , this quantity is S 2 X i . Also note that when incorporated in CMS, these approaches will change not only variance estimates in comparison screening but also the N B values in Equation (1) that represent the maximum number of samples needed to complete comparison of remaining systems with B. Then the four modifications are given as follows: The first modification TS 1 describes a simple but conservative modification in the sense that S 2 X i + S 2 X j bounds S 2 X i j in expectation. It restores the observed PCS but will perform similarly to the case when systems are simulated independently.
Our second modification TS 2 benefits from the reduction of variance CRN can provide but only when sample sizes are equal. This idea would be a natural choice to preserve PCS and utilize correlation but, unfortunately, when CMS is implemented, r i = r j for almost all samples. Therefore, the results obtained for the first two modifications, applied within CMS, will be virtually identical. However, for other procedures such as HAK, this may still be a desirable, simple modification.
The discussion in Section 4.1 suggests that the continuation region is corrupted when S 2 X i j < S 2 X i and r i < r j . Instead of reverting to the conservative estimate of variability when sample sizes are not equal, the third modification TS 3 uses S 2 X i as a bound on variability, when r i is less than r j . The last modification TS 4 is a more aggressive one. We note that when r j is large and r i ≈ 0, S 2 X i would dominate the variability of the process in Equation (4). However, when r i and r j are close, we would see variance closer to S 2 X i j . This suggests that when r i < r j , the variance estimatê S 2 X i j should be close to S 2 X i for small r i , andŜ 2 X i j should approach S 2 X i j as r i approaches r j . Of all of the proposed modifications, it is reasonable to expect TS 4 to approximate the variability of Equation (4) the best, making it the most promising heuristic. Healey (2010) empirically confirms this point and thus we present the results only with TS 4 in Section 5.2.2.

Experimental results
In this section, we evaluate the performance of our new CMS procedure compared to the performance of other R&S procedures capable of handling multiple constraints, namely, HAK, HAK+, and MD R of Healey et al. (2013Healey et al. ( , 2014, in terms of the number of switches, number of required observations, and observed PCS. In Section 5.1, we discuss the experimental setup. In Section 5.2, we provide an analysis of CMS, with and without the heuristic modifications to incorporate CRN.

Setup
The mean and variance configurations for our experiments attempt to provide analogous results and analysis to the experimental studies of previous related fully sequential indifference-zone R&S studies, namely, Kim and Nelson (2001), Hong and Nelson (2005), Pichitlamken et al. (2006), Andradóttir and Kim (2010), and Healey et al. (2013Healey et al. ( , 2014. Our experiments will test the procedures with 10 000 macro-replications. In our tables, the first three digits of the number of observations and up to the 0.001th digit of PCS are meaningful. For all tests, we set n 0 = 20, and δ = = 1/ √ 20 (the sample standard deviation of an average of n 0 = 20 independent samples with a variance of one) for all = 1, 2, . . . , s. We set a nominal PCS of 1 − α = 0.95. We set the number of acceptable system in S A to be zero, as Andradóttir and Kim (2010) show that the existence of acceptable systems does not significantly affect the results.
We utilize the generalized means configurations of Healey et al. (2013). We consider only one infeasible constraint (v = 1) because determining feasibility is more difficult when there is only one infeasible constraint. The Difficult Means (DM) configuration attempts to test the validity of the procedures by assigning system means in a challenging setup. For our experiments, the DM configuration is . . , k and = 2, 3, . . . , s.
We set the constraint levels, q , to zero. Similarly, we also consider the Monotonically Increasing Means (MIM) configuration of Healey et al. (2013) with v = 1, which will allow us to determine the efficiency at which the procedures determine the feasibility of clearly infeasible or feasible systems and compare substantially distant systems. The MIM configuration tested is, x i = E[X i j ] = (i − 1)δ for i = 1, 2, . . . , k and . . , k, and = 1, where again we set q = 0.
For the experiments, we examine a combination of variance configurations to test the procedures under different difficulty of feasibility check and comparison. We consider a similar setup to Healey et al. (2014), as we include low (L) and high (H) variances for the primary and secondary performance measures, σ 2 x i and σ 2 y i , respectively. For simplicity, all secondary performance measures = 1, 2, . . . , s, are assigned identical variances. High variance results in either σ 2 x i = 10 or σ 2 y i = 10 and low variance sets σ 2 x i = 1 or σ 2 y i = 1. We consider in total four variance configurations: L/L, L/H, H/L, and H/H. For example, L/H implies that the primary performance measure has low variances and the secondary performance measures have high variances, and thus the difficulty of the feasibility check dominates.
We perform our tests with two system sizes, either 15 systems with eight feasible or 101 systems with 51 feasible, in addition to three constraints and a combination of various mean and variance configurations. Each infeasible system violates only one of the constraints. This setup challenges the PCS of the procedures, as shown by Andradóttir and Kim (2010) and Healey et al. (2013). Half of the systems must be eliminated by comparison and half must be eliminated by feasibility check. The feasibility check is also difficult, as screening must catch the single violated constraint.
As in Section 4, we let ρ x be the correlation across systems' primary performance measure samples. We will consider both independently simulated systems and systems with induced ρ x > 0, modeling CRN. Andradóttir and Kim (2010) and Healey et al. (2014) present empirical results that show correlation across primary and secondary performance measures does not have a major impact on performance. Similarly, Batur and Kim (2010) show that correlation across secondary performance measures does not largely affect the performance of the feasibility check procedure F I B . We expect that similar conclusions would be found here. Finally, we assume that the secondary performance measures are not correlated across systems under the CRN approach. In practice, secondary performance measures will likely be correlated across systems, but this correlation is unlikely to have a major impact on performance, since the feasibility check is performed separately for individual systems. We confirm that this is indeed the case by performing experiments with correlation across primary and secondary performance measures ranging from −0.75 to 0.75 with ρ x = 0, 0.25, 0.5, 0.9. Hence, we present experimental results with (i) independent primary and secondary performance measure samples and (ii) independent secondary performance measure samples in this section.

Results
In our experimental results, we display the effectiveness of constrained R&S procedures, with respect to observed PCS, average number of required samples, and average number of switches. We define a switch to be the change of sampling from one system to another system. A two-stage procedure for k systems (all feasible) requires at most 2k − 1 switches, two sets of sampling for each system (one to gather firststage samples and one to complete comparison). Fully sequential procedures register a switch after each stage of sampling for every system remaining in contention. In Section 5.2.1, we consider independent systems. Section 5.2.2 discusses how the use of CRN affects the performance of HAK and CMS and provides an analysis of how our heuristic modifications can produce good PCS, even under high correlation. The performance of MD R (the best performer among the three considered competi-tors) is compared to that of CMS under correlated systems in Section 5.2.3.

Systems under independent sampling
To evaluate the performance of CMS under independent sampling of systems, we compare it to three procedures for constrained R&S, namely, the HAK, HAK+, and MD R procedures.
We operate the four procedures under similar setups. For example, we choose α 1 = α 2 in HAK and β 1 /s = β 2 in HAK+, MD R , and CMS, so that error is allocated equally between feasibility check and comparison. Healey et al. (2014) experimentally show that this allocation is a robust choice. For consistency with CMS, all procedures are implemented with the feasibility check procedure F I B , although there are other methods that could be utilized (see, e.g., Batur and Kim (2010)). We provide results for all combinations of the DM and MIM means and L/L, L/H, H/L, and H/H variance configurations. Tables 2, 3, and 4 display the observed PCS, average number of observations, and average number of switches, respectively.
The performance in Table 2 is expected to be better than the nominal 0.95. We observe this to be true in all cases. Table 3. Average number of required samples for procedures with k independent systems, s = 3 constraints, and b feasible systems Moreover, CMS commonly provides a higher PCS than the other procedures, which is a result of the extra samples needed to limit switches during the procedure's comparison phase. The observed PCS is higher for k = 101 than when k = 15 (also with MIM as opposed to DM) for all four procedures.
The comparison phase of CMS can make this procedure less attractive than the other procedures in terms of the number of required observations. Still, Table 3 shows this is not always the case. When only the feasibility check is difficult (L/H), CMS can be relatively efficient, bettering the totals of all procedures except MD R . By deter-  Table 3 indicates that more observations are required for DM than for MIM, as expected. Table 4 shows why CMS is a competitive procedure when the cost of switches is counted. This cost is incurred when systems are simulated one at a time, not in parallel. For every configuration, CMS requires 2k − 1 or less switches when simulating k systems. The other simultaneously running procedures, HAK+ and MD R , can require thousands of switches, as every stage of sampling consists of as little as one observation from each system in contention. HAK is a special exception. When feasibility check is difficult and no additional samples are needed to complete comparison (L/H), HAK can achieve as little as k − 1 switches. However, this performance is not seen in hard comparison configurations, where CMS clearly outperforms HAK. Not surprisingly, Table 4 also indicates that more switches are needed for a larger number of systems k and in the DM configuration. Figure 1 provides the combined cost of sampling and switching for our systems under the L/L and L/H variance configurations, whereas switching when the factors are one and 10, respectively, for k = 15 and 101 systems. Hong and Nelson (2005) perform an analysis of total costs when switching costs are a factor of 1, 10, 100, and 1000 times larger than the sampling costs per observation. We feature experimental results for a factor from zero (no switching cost) to 10 (switching takes 10 times more time than sampling); a factor larger than 10 will yield results that are more favorable to CMS. Figure 1 shows that the relative efficiency of CMS improves compared with the other procedures when the cost of switching increases. In fact, even in the extreme case of factor = 1, our switching procedure is the best performer for the L/H variance configuration when feasibility check is hard. As switching costs become more expensive, the results substantially favor our switching procedure, as the average total cost linearly increases with a slope that is equal to the number of switches. Thus, under a large factor, say, 10, CMS is clearly the efficient choice for all mean and variance configurations, significantly improving on the other procedures in all cases, and featuring as little as a quarter of the combined sampling and switching costs in the best case (H/L). Even when HAK requires 2k − 1 switches, we still find CMS to be the best performer, as CMS requires fewer samples in these cases. As the switching costs are multiplied by an even larger factor (e.g., 100 or 1000; see Hong and Nelson (2005)), we expect to see an even wider advantage in using CMS with any number of systems.

Systems under CRN
In this section, we examine the performance of procedures under CRN that compare systems with uneven sample sizes, specifically HAK and CMS, with our new modified variance estimates. HAK+ and MD R are not considered in this section because they are statistically valid under CRN We compare HAK and CMS applied to independent systems, with versions of HAK and CMS modified for correlated systems with induced correlation ρ x ∈ {0, 0.1, 0.25, 0.5, 0.75, 0.9}. We denote the procedures with these parameters as HAK(ρ x ) and CMS(ρ x ). In these procedures, we utilize adjusted parameters β 2 = sβ 1 = α/k. These parameters guarantee valid selection of the best feasible system under correlation in similar procedures (see Lemma 2) but not in HAK and CMS, as we show in the following tables. The parameter adjustment means that ρ x = 0 produces a slightly higher PCS and number of required observations than the independent case.
Similar to the results of the S SM comparison procedure in Table 1, we see that PCS suffers in configura-tions with correlation over 0.9 in HAK and over 0.75 in CMS in most variance configurations. It is also noteworthy that PCS is much lower when σ 2 x i is high. This is because for a given ρ x , the underestimation of variability becomes more pronounced when σ 2 x i is large, as one can see from Equations (3) and (4).
In terms of sampling, Table 6 shows that the use of CRN significantly reduces the number of observations needed.
The new values of β 1 and β 2 used for correlated systems cause the procedures to perform slightly worse when applied to truly independent systems than procedures designed for independent systems. As correlation increases, the procedures exploiting CRN will require fewer observations. Even at modest levels of correlation, we can see significant improvement over the independent case. Savings due to CRN are restricted to the comparison phase, so (L/H) configurations feature only a small advantage for implementing CRN, whereas other variance configurations display larger savings. Not surprisingly, H/L configurations feature the largest savings. Difficult feasibility check configurations also require higher levels of correla- Tables 7 and 8 present the effectiveness of our heuristic variance modification TS 4 under ρ x = 0.9 for k = 15 systems, with HAK(ρ x ) + TS 4 and CMS(ρ x ) + TS 4 denoting an implementation of HAK and CMS with the variance modification TS 4 and induced correlation ρ x . For the sake of brevity, we feature only configurations with k = 15 systems as most of the conclusions reached in Tables 5, 6, A.3, and A.4 were similar for k = 15 and k = 101. Table 7 displays the observed PCS for our procedures with and without the heuristic modification TS 4 , for ρ x = 0.9. Although we do not report the results for the other three heuristic modifications, we observe that all four modifications display a marked improvement in PCS, raising observed values above 0.988 in all configurations. The TS 4 modification tends to provide the smallest PCS, which experimentally confirms it to be the most aggressive modification as we conjectured in Section 4.2. Table 8 displays the average number of required samples for our procedures, with and without the heuristic modification TS 4 , for ρ x = 0.9. Whereas the PCS results are similar for all four modifications, the additional observations required to secure PCS highly depends on the modifications implemented. For at most 54% additional samples in Table 8 than the original procedure for correlated systems, TS 4 provides efficiency and good PCS. Utilizing this modification sacrifices only a small amount of samples to provide a good PCS and still significantly outperforms the independently sampled case.

Comparison of MD R and CMS + TS 4 under CRN
In this section, we compare MD R and CMS with TS 4 under the CRN approach. Although we do not report PCS due to page constraints, we observe that both procedures achieve PCS over the nominal level for all cases tested. Average numbers of required samples and switches are reported in Tables A.5 and A.6 of the Online Supplement for k = 15 systems with induced correlation (ρ x ). For all ρ x tested, MD R always spends fewer average number of required samples but CMS + TS 4 requires a significantly smaller number of switches compared to MD R .  Table A.7 in the Online Supplement reports the combined cost when switches are 10 times as costly as samples for various levels of ρ x . CMS + TS 4 achieves smaller average total costs than MD R with negligible degradation in PCS when the cost of switching is relatively larger than that of sampling.

Conclusions
We present a procedure, CMS, for constrained R&S that minimizes the number of switches between simulated systems while finding the best constrained system. This is desirable, as the cost of switching can be expensive. We prove the validity of this procedure, guaranteeing a nominal probability of selecting the best feasible system for independently sampled systems.
To improve the efficiency of the procedure, we also wish to utilize CRN to reduce variance within comparisons. We show how strong positive correlation can adversely affect the PCS for procedures, such as CMS, that use two-sample comparison, because of the underestimation of the variance during the comparison. To achieve the nominal PCS while still increasing efficiency, we propose four variance modifications.
Our experiments show that CMS is an efficient option, if the cost of switching is larger than the cost of sampling or the feasibility check phase is difficult, and that the savings in total costs brought by CMS can be huge. If the cost of switching is not significant compared with the cost of sampling, MD R usually performs best under both independent and correlated systems. Ensuring a minimal number of switches requires extra observations, but CRN can reduce the number of necessary samples. Our experiments show that the heuristic variance modifications provide good PCS, and some of them also preserve most of the savings due to CRN.