Maximizing quantitative traits in the mating design problem via simulation-based Pareto estimation

ABSTRACT Commercial plant breeders improve economically important traits by selectively mating individuals from a given breeding population. Potential pairings are evaluated before the growing season using Monte Carlo simulation, and a mating design is created to allocate a fixed breeding budget across the parent pairs to achieve desired population outcomes. We introduce a novel objective function for this mating design problem that accurately models the goals of a certain class of breeding experiments. The resulting mating design problem is a computationally burdensome simulation optimization problem on a combinatorially large set of feasible points. We propose a two-step solution to this problem: (i) simulate to estimate the performance of each parent pair and (ii) solve an estimated version of the mating design problem, which is an integer program, using the simulation output. To reduce the computational burden when implementing steps (i) and (ii), we analytically identify a Pareto set of parent pairs that will receive the entire breeding budget at optimality. Since we wish to estimate the Pareto set in step (i) as input to step (ii), we derive an asymptotically optimal simulation budget allocation to estimate the Pareto set that, in our numerical experiments, out-performs Multi-objective Optimal Computing Budget Allocation in reducing misclassifications. Given the estimated Pareto set, we provide a branch-and-bound algorithm to solve the estimated mating design problem. Our approach dramatically reduces the computational effort required to solve the mating design problem when compared with naïve methods.


Introduction
Plant breeding is concerned with the evolution of plant populations under selection pressure imposed by humans (Allard, 1999). This process has proceeded for thousands of years as breeders purged undesirable plants and preferentially propagated individuals with desirable characteristics (Simmonds, 1979;Allard, 1999). In a given breeding population, the space of potential outcomes (i.e., progeny) is often too large to produce and evaluate empirically (Simmonds, 1979). Thus, historically, breeders have optimized traits over a small subset of feasible progeny. Modern computing allows breeders to explore the space of candidate matings using Monte Carlo simulation, the results of which can accelerate the process of improving economically important traits during natural breeding. Thus, the question of how to conduct matings to achieve "maximal" improvement in the progeny is a question of decision-making under uncertainty. Unlike traditional operations research application areas such as manufacturing, finance, healthcare, reliability, and transportation systems, plant breeding and agriculture have received less attention from the operations research community. Traditionally the domain of statisticians, agriculture is rife with decisions that must be made under uncertainty. Thus, agriculture and plant breeding offer impactful and interesting opportunities for the operations research community and, more broadly, the industrial engineering community-one of which we explore in this article, called the mating design problem.
CONTACT Susan R. Hunter susanhunter@purdue.edu Supplemental data for this article can be accessed at www.tandfonline.com/uiie The mating design problem arises when commercial breeders selectively mate individuals to improve quantitative traits, such as height or yield. The mating design problem can be informally described as follows: Given a finite parent population and limited resources for breeding-that is, a set number of plants that can be planted, called the breeding budget-find an assignment of the breeding budget to parent pairs that maximizes the expected value of some statistic of the trait observed in the progeny.
A solution to the mating design problem, called a mating design, allocates the finite breeding budget to specific mate pairs. Given a predetermined population of parent pairs, our primary objectives in this work are (i) to identify optimal mating designs while (ii) avoiding excessive cloud computing costs due to largescale scientific simulations and (iii) to produce accurate predictions of progeny traits expected from the selected mating designs.
Researchers have used operations research techniques such as linear programming (Jansen and Wilton, 1985), integer programming (McConnel and Galligan, 2004), and genetic algorithms (Kinghorn, 2011) to solve variants of the mating design problem. This line of work measures the quality of a mating researchers began using quantiles of the progeny distribution to measure mating design quality (Schnell, 1983;Smith and Hammond, 1987;Zhong and Jannink, 2007). We offer a fundamental improvement to the mating design problem by introducing a novel objective function that directly maximizes the expected maximum progeny trait, rather than the mean or a quantile.
To reduce the computational burden associated with optimizing the new objective function, we analytically characterize and discard the subset of parent pairs guaranteed to receive zero allocation at optimality in the mating design problem, regardless of the breeding budget. The remaining parent pairs are characterized as the solution to a bi-objective Simulation Optimization (SO) problem, called the "Pareto set. " Our simulation experiments indicate that this Pareto set is typically small enough to allow efficient construction of the optimal mating design via a branch-and-bound approach. Therefore, our solution approach effectively reduces the mating design problem to a problem solved in two steps. First, we estimate the Pareto set of parent pairs using Monte Carlo simulation. Then, we solve an estimated version of the mating design problem using a branch-and-bound algorithm on the estimated Pareto set. This procedure dramatically reduces the search space of potential progeny and prevents the need to re-simulate if the breeding budget changes. We present simulation evidence to suggest that our approach does, in fact, identify near-optimal solutions with high predictive accuracy. We view predictive accuracy as a vital performance requirement as predictions of trait selection gains are often used to justify the resource expenditure incurred by proceeding with empirical experiments.
Plant breeding is an application of great consequence, since … the planet's population [is] predicted to reach 9 billion by 2050.
To feed everyone, we must grow plants that deliver higher yield, are more nutritious, use water and nutrients more effciently, and tolerate more environmental variations that today's agricultural crops. (Purdue University, 2013) This work illustrates a successful theoretical advancement and implementation in plant breeding, an important but less visible domain for the operations research and industrial engineering communities.

Problem statement and overview of approach
We now consider the mating design problem and our solution approach in more detail. For each candidate parent pair, suppose that the trait expression in a child is one realization from the progeny trait distribution corresponding to that parent pair. For example, the height of a child plant is one realization from the distribution of the height of child plants produced by its parent pair. We assume that the distributional family for the trait is known, but the parameters of the distribution are unknown. We now define the new max breeding problem.
Max breeding problem: Consider a finite number r > 0 of breeding parent pairs producing a fixed, finite sample of b > 0 children, and let the vector of breeding allocations to the parent pairs be b = (b 1 , b 2 , . . . , b r ). For random observations of the children's traits Y ij , j = 1, . . . , b i from parent pair i, we wish to solve: . . , r, where Z + is the set of non-negative integers. That is, we wish to find the allocation of the breeding budget b that maximizes the expected maximum of the trait observed in the progeny. The solution to this problem, b , is a vector containing the number of child plants to breed from each of the parent pairs in the population.
Since breeders must construct the entire mating design prior to the growing season, at which time the children's traits are realized simultaneously, we cannot use "real life" sequential methods to solve this problem. Indeed, the problem stated as Equation (1) resembles the max k-armed bandit problem described by Cicirello and Smith (2005) and Smith (2006a, 2006b), but simultaneous observation of the children's traits, or "rewards, " in the growing season precludes the application of such sequential methods. Instead, we use Monte Carlo simulation to assess the trait distributions of the parent pairs (see Li et al. (2012) for a discussion of simulation in breeding). Thus, problem (1) can be considered a Ranking and Selection (R&S) problem over the space of all possible breeding budget allocations (see, e.g., Kim and Nelson (2006) for an overview of R&S methods). However, the set of feasible points is combinatorially large, rendering R&S methods impractical.
Thus, we propose a different, two-step approach. In the first step, called the simulation step, we simulate progeny from each parent pair to estimate the trait distribution parameters. In the second step, called the IP step, we solve an estimated version of the integer program in problem (1), using estimates of the trait distributions for each parent pair obtained as output from the simulation step. For clarity, we emphasize here that the computational budget in the simulation step, n, is separate and distinct from the physical breeding budget b in problem (1). The breeding budget b is part of the problem, whereas the simulation budget n is part of our solution. The final output of the IP step, the estimated optimal breeding allocation, is a function of the simulation budget n. Under mild assumptions, as the simulation budget n tends to infinity, the estimated solution to the mating design problem that results from the IP step converges to the true solution to problem (1).
Two potential issues remain when solving the max breeding problem with this two-step approach. First, if the number of parent pairs r is large, both the simulation step and the IP step may still require significant computing effort. Second, the solution to the max breeding problem sometimes allocates all children to a single parent pair, which is unlikely to be implemented in practice. Thus, we wish to return to the user a reduced set of parent pairs likely to produce extrema, in addition to the allocation resulting from the IP step. The breeder then acts as a human-inthe-loop during the "optimization" process, accounting for factors external to the genetic model.
We address these two issues by analytically identifying and eliminating from consideration parent pairs that will not receive any of the breeding budget at optimality in the mating design problem (1). In particular, when the trait distributions are normal with finite mean μ i and finite variance σ 2 i for all i = 1, . . . , r, the only parent pairs that receive non-zero breeding budget at optimality in problem (1) are members of a Pareto set that is characterized by parent pairs with non-dominated means and variances, P = {parent pairs i : , for all k = 1, . . . , r; k = i}. (We further discuss this set in Section 2.) Then instead of solving problem (1) for r parent pairs, it is equivalent to solving problem (1) for only parent pairs in P and allocating none of the breeding budget for parent pairs outside the Pareto set P. We then focus our simulation effort on identifying the Pareto set in the simulation step since only parent pairs that are estimated as Pareto will be included in the IP step. Furthermore, we can return the estimated Pareto set to the breeder as part of the mating design process. We note here that the assumption of normal trait distributions is typical for quantitative traits such as yield (Falconer and Mackay, 1996;Lynch and Walsh, 1998) and is based on the infinitesimal model of genetic architecture. Although the infinitesimal model is not taken as an exact description of biological reality, countless experiments have led the scientific community to accept the model as a useful and practical approximation of quantitative trait expression. Unless otherwise stated, henceforth we consider normal trait distributions.

... Search space reduction in the IP step
How much easier is the max breeding problem (1) on the reduced search space of parent pairs in P? For randomly generated populations of parent pairs using the standard maize genetic map ; see Appendix A of the online supplement for simulation details), we calculated sample quantiles of the cardinality of the estimated Pareto set across different parent pair population sizes. Given a randomly generated set of parent pairs, 500 children were simulated from each parent pair to estimate the mean and variance parameters. The cardinality of the estimated Pareto set (| P|) was calculated for this parent pair population, using the estimated means and variances. This process was repeated 3000 times for each parent pair population size. Sample quantiles of | P| are presented in Fig. 1.
The data in Fig. 1 appear to follow a logarithmic law, and the cardinality of the estimated Pareto set rarely exceeds 25 parent pairs-even when the original population is 20 000 parent pairs. Thus, we have drastically reduced the search space of the max breeding problem. For example, assuming b = 50 children, r = 100 potential parent pairs, and | P| = 15, the number of feasible solutions is reduced from approximately 1.34 × 10 40 to 4.78 × 10 13 -a far smaller number but still too large to solve problem (1) with available R&S methods. Furthermore, since the estimated Pareto set is "small, " we can present the entire set to the breeder.

... Simulation effort reduction in the simulation step
In addition to reducing the computational burden of the IP step, a focus on identifying the Pareto set in the simulation step can save significant simulation effort. In the case of normal trait distributions, where the Pareto set is characterized by parent pairs with non-dominated means and variances, estimating the Pareto set is a bi-objective SO problem on an unordered finite set.
The body of work on SO in the presence of multiple performance measures is relatively new, with most work posing additional performance measures as constraints (see, e.g., Szechtman and Yücesan (2008), Andradóttir and Kim (2010), Batur and Kim (2010), Park and Kim (2011), Lee et al. (2012), Healey et al. (2013), Hunter and Pasupathy (2013), Luo and Lim (2013), Nagaraj and Pasupathy (2013), Healey and Andradóttir (2014), Pasupathy et al. (2014)). Very little work has been done in the area of simultaneous stochastic objectives, apart from that of Ryu et al. (2009), Ryu (2011a, 2011b) in continuous spaces, and Butler et al. (2001) and Lee et al. (2010) on finite spaces. Since Butler et al. (2001) used utility functions to turn the multi-objective problem into a single-objective problem, it is not an appropriate method for our context. Thus, the only appropriate method is Multi-objective Optimal Computing Budget Allocation (MOCBA) by Lee et al. (2010), a heuristic sampling framework developed along the lines of the popular Optimal Computing Budget Allocation framework (Chen et al., 2000) for multi-objective SO on finite sets. MOCBA requires the assumption that each objective is estimated by a sample mean constructed from independent and identically distributed (i.i.d.) normal random variables.
We implement and evaluate MOCBA's performance in estimating the Pareto set in the context of our plant breeding application. We also derive, implement, and evaluate a new sequential sampling scheme that is based on the asymptotically optimal sampling allocation. To derive the sampling allocation in a bi-objective context, we use a large deviation (LD) framework. Such a framework has previously been employed in the unconstrained, single-objective context (Glynn and Juneja, 2004) and in the constrained, single-objective context (Hunter and Pasupathy, 2013;Pasupathy et al., 2014). The asymptotically optimal sampling allocation is characterized as the solution to a concave maximization problem, which we progressively estimate through a sequential sampling framework. Sampling terminates when the simulation budget has been expended or some amount of wall-clock time has passed. This article is the first to derive an asymptotically optimal sampling allocation in a biobjective context and to show successful implementation of an LD-based allocation when objective estimates are non-normal. Specifically, our second objective is a sample variance, which can be written as a sum of i.i.d. squared normal random variables having a chi-square distribution.
Our implementation reveals that our sequential sampling framework out-performs MOCBA on a variety of performance measures, in part because of the way our sampling framework controls for misclassification errors. Two types of error can occur in estimating a Pareto set: Misclassification by Exclusion (MCE), where a truly Pareto parent pair is falsely estimated as non-Pareto, and Misclassification by Inclusion (MCI), where a non-Pareto parent pair is falsely estimated as Pareto. MOCBA controls MCE and MCI errors by first creating a bound on each type of error and then allocating to control the type of error whose estimated bound is the largest in each step of the sequential algorithm. As the estimated bounds converge to their true values, the allocation will be based on minimizing the larger bound. In contrast, the sampling framework we present allocates to simultaneously control both types of error, MCE and MCI, at each step in the sequential implementation. Furthermore, as the sampling budget tends to infinity, our algorithm will continue to allocate in a way that simultaneously controls both types of error.
The estimated Pareto set resulting from the simulation step is used in the IP step to estimate the optimal breeding budget allocation. Since the estimated Pareto set is constant as a function of the breeding budget, we also save simulation resources in the event that the breeding budget changes. For normal trait distributions, we use a branch-and-bound method for solving the resulting nonlinear integer program.

Notation and assumptions
We assume that an appropriate genetic simulation model, the desired parent pairs, and the trait distributional family are provided as input to the methods we develop. Unless otherwise stated, we consider traits for which the trait distribution is assumed to be normal. Formally, we make the following assumptions; for brevity, we write i ࣘ r as shorthand for i = 1, 2, . . . , r.
Assumption 1: (i) For each parent pair i ࣘ r, the observations of the children's traits are i.i.d. copies from a normal distribution with unknown finite mean μ i and unknown finite variance σ 2 i > 0; (ii) observations of the children's traits are mutually independent across parent pairs; Assumption 1(iv) is standard in optimal allocation literature, since it ensures that all parent pairs are distinguishable on each objective with a finite sample size.
The probability space we consider is conditional on the fixed and unknown mean and variance of the trait distribution for each parent pair. It may be possible to develop a higher-level Bayesian model that accounts for some belief about the population structure due to pedigree. There are instances where the population structure is minimally informative-it may be the case that all parent pairs share a single common parent or that each parent pair is distinct, as may result from each parent breeding with itself. However, population structure may be highly informative, as when half the population shares one parent while the other half does not. The methods we present are general in that they can be used for any population structure, do not require a detailed population model, and provide significant computational gains over previously used naive methods. Exploiting information about population structure may result in methods that further reduce the computational effort of the simulation step. We consider such methods as future research.

The max breeding problem
In this section, we first provide intuition on the nature of optimal breeding budgets. We then formulate the objective function for the max breeding problem under the normality assumption. Throughout this section, we assume that the parameters of the trait distributions are known.
First, consider the case in which the trait is either present or not present in a child. For example, a child may either be resistant or susceptible to a particular disease, and suppose that the resistant state is desirable. Thus, the trait distributions under consideration are Bernoulli with parameter θ i for each parent pair i ࣘ r, where θ i represents the probability that the child of parent pair i is resistant to the disease. Then the max breeding problem is equivalent to the following problem that minimizes the probability of observing no children with the desirable trait: The solution to this problem gives all of the breeding budget to the parent pair with the largest probability of producing a child with the desirable trait, which we assume is unique. Thus, let- The Pareto set of parent pairs is P = {parent pairs i : i = arg max i≤r θ i }, and identifying the Pareto set is an R&S problem.
Unlike with Bernoulli trait distributions, when the trait distribution is normal, one will not necessarily allocate all of the breeding budget to one parent pair in the max breeding problem. In Example 1 we show that allocating all of the breeding budget to one parent pair may be suboptimal.
When one child is observed (b = 1), intuitively, the child should be bred from the parent pair with the largest mean. When the breeding budget is infinite, the children should be bred from the parent pair with the largest variance-as the breeding budget grows, the largest-variance parent pair will overcome the largest-mean parent pair in producing extrema. For finite budgets larger than one, there is a balance in allocating to high-mean and high-variance parent pairs. In this example, allocating one sample to the higher-mean, lower-variance parent pair places a high probability on observing at least one child with a large trait value. The remainder of the samples are then allocated to capitalize on the high-variance parent pair's ability to produce extrema.
In Proposition 1, we specify the max breeding problem as an integer program, assuming that the means and variances are To optimally allocate samples to maximize the expected maximum child characteristic observed, we wish to solve the max breeding problem: Solving Problem MB may be unwieldy for large b and r. Thus, we identify and discard parent pairs that receive a sample size of zero at optimality. Loosely speaking, to generate extrema from two normal distributions, where one distribution has larger mean and variance than another distribution, all random variables should be generated from the distribution with the larger mean and variance. Therefore, intuitively, all parent pairs that are dominated in both mean and variance by some other parent pair should not receive any of the final breeding budget. This intuition is correct: for any breeding budget b ࣙ 1, the only parent pairs that receive nonzero breeding budget in the max breeding problem are a subset of a Pareto set of increasing means and decreasing variances. Theorem 1 states this result.
Theorem 1. Let the Pareto set P be the set of all nondominated parent pairs, P = {parent pairs i : Then for all b ࣙ 1, the set of parent pairs receiving non-zero sample at optimality in the max breeding problem, Problem MB, is a subset of P.
Proof. See Appendix B of the online supplement.
Although in Example 1 two parent pairs share the allocation, it is not intuitively obvious from Theorem 1 that three or more parent pairs may share the allocation at optimality in Problem MB. Thus, there will not necessarily be a "best mean" parent Table . For b = , the expected maximum is largest when all three parent pairs share the budget.
pair and a "best variance" parent pair that will together receive all of the breeding budget, as shown in Example 2.
Example 2: Let three parent pairs have normal trait distributions with means and standard deviations μ 1 = 0.80, σ 1 = 0.58; be the breeding allocation. For budget b = 3, at optimality in Problem MB, each of the three parent pairs receives an allocation of one child, as shown in Table 2.

Allocating the simulation budget in the simulation step
We now consider the problem of efficiently allocating a simulation budget to estimate the Pareto set. For normal trait distributions, identifying the Pareto set P through Monte Carlo simulation is a bi-objective SO problem. The asymptotically optimal sampling allocation that maximizes the rate of decay of the probability of a misclassification is derived in Appendix C of the online supplement, which we outline in Section 3.1 before presenting a sequential implementation in Section 3.2. For brevity, the results in Section 3.1 are presented with an emphasis on knowledge required for implementation, with full details and all proofs in the online supplement.

Asymptotically optimal allocation strategy
Consider a procedure to estimate the Pareto set that consists of expending a simulation budget n to estimate the means and variances of the trait distributions for each parent pair and returning to the user the estimated set of Pareto parent pairs. Let α = (α 1 , α 2 , . . . , α r ) be the proportional allocation of the simulation budget n to the parent pairs. For now, we ignore that nα i is not necessarily an integer. For each parent pair producing nα i children with characteristics Y ij observed as i.i.d. copies from a normal distribution with mean μ i and variance σ 2 i for all j = 1, . . . , nα i , let

Then the estimated Pareto set is
Under this sampling paradigm, as the simulation budget tends to infinity, the probability of a Misclassification (MC) event-that is, misclassifying a parent pair on its status as Pareto or non-Pareto-decays to zero. In this section, we present the asymptotically optimal sampling allocation that maximizes the rate of decay of the probability of misclassification (P{MC}) as a function of the proportional sampling allocation, α.
In the context of our estimation procedure, an MC event can occur in one of two ways, which we call MCE and MCI. MCE is the event in which a truly Pareto parent pair is falsely excluded from the estimated Pareto set by being estimated as dominated by another parent pair, whether Pareto or non-Pareto. MCI is the event in which a truly non-Pareto parent pair is falsely included in the estimated Pareto set by being estimated as nondominated. That is, some i ∈ P dominated by some k and some j / ∈ P not dominated by any k A straightforward way of writing the MC event is MC := MCE Þ MCI, such that the probability of an MC event is P { MC } = P { MCE Þ MCI }, but this probabilistic statement is difficult to analyze due to dependence in the MCI term. Thus, we reformulate the MC event for easier analysis. First, let us label the true Pareto parent pairs by their means, so that μ 0 := −∞ < μ 1 < . . . < μ p−1 < μ p and σ 2 1 > σ 2 2 > . . . > σ 2 p > σ 2 p+1 := 0, where p is the cardinality of P. Then the true Pareto parent pairs are (μ , σ 2 ) for = 1, . . . , p, where uniquely indexes the Pareto parent pairs. We also define the true phantom parent pairs, which are the coordinates (μ , σ 2 +1 ) for = 0, 1, . . . , p, where we place phantom parent pairs at (−∞, σ 2 1 ) and (μ p , 0). There are p + 1 phantom parent pairs. Assuming that the true Pareto set is known, Fig. 2 displays the Pareto and phantom Pareto parent pairs. Figure . If the Pareto set were known and, then to be falsely estimated as Pareto, the non-Pareto parent pairs must be falsely estimated as being in the MCI or MCE "regions, " thereby dominating a phantom parent pair.
Since the true Pareto set is actually unknown, the phantom Pareto parent pairs must be estimated.
Since Theorem 2 holds, henceforth we use the notation P{MC} to refer to the probability of a misclassification event.
Recall that to efficiently identify the Pareto set, we wish to identify the sample allocation vector that maximizes the rate of decay of the P{MC}. Before presenting the rate of decay of P{MC} in Theorem 3, we first require additional notation and results. By Glynn and Juneja (2004), under Assumption 1, for all i ∈ P, k ≤ r such that μ k < μ i , we have .
We present an expression for the large deviations rate function corresponding to the event σ 2 i ≤ σ 2 k in Proposition 2.
Proposition 2. Under Assumption 1, for all i ∈ P, k ≤ r such that σ 2 k < σ 2 i : Since the estimated mean and estimated variance for a normal distribution are independent, the objectives Y i and σ 2 i are independent for all i ࣘ r. This fact, along with the additional material provided in the online supplement, yields the following Theorem 3. Thus, the overall rate of decay of the probability of a misclassification is the slowest among the rates of decay of many pairwise misclassification events.
To maximize the rate of decay of P{MC}, we allocate the sample by solving: Note that the first set of constraints in Problem Q hold for all i, k ∈ P, k = i instead of for all i ∈ P, k ≤ r, k = i. This simplification occurs because anytime some j / ∈ P falsely excludes i ∈ P, it must also dominate a phantom parent pair, an event whose rate function already appears in the second set of constraints in Problem Q. Thus, Problem Q has p(p − 1) constraints corresponding to controlling the rate of decay of P{MCE} and (r − p)(p + 1) constraints corresponding to controlling the rate of decay of P{MCI ph }, where each constraint has at least one non-zero term in the sum. Problem Q is a concave maximization problem in α, which follows from a proof similar to that in Glynn and Juneja (2006).
Since Slater's condition holds for the concave maximization Problem Q, the Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient for global optimality (see, e.g., Boyd and Vandenberghe (2004)). The KKT conditions for Problem Q imply that α i > 0 for all i ࣘ r. Although in the case of Glynn and Juneja (2004) and Hunter and Pasupathy (2013) the KKT conditions required equating the rates of the suboptimal systems, the KKT conditions here are somewhat less insightful: the optimal allocation in Problem Q may be determined by various combinations of the pairwise "contests" between the systems. Thus, we solve Problem Q with numerical optimization.
Example 3: One hundred parent pairs were randomly generated using the standard maize genetic map. Each parent pair received a sample of 500 children to estimate the mean and variance, which we assumed are the true mean and variance in this example. Figure 3 shows the locations of the means and standard deviations for the 100 parent pairs. The size of the circle is proportional to the amount of sample received when the allocation is optimal under equal allocation and the proposed allocation scheme, respectively. That is, Fig. 3b provides a pictorial representation of the solution to Problem Q.

Sequential algorithm for estimating the Pareto set
Since μ i and σ 2 i must be estimated for all i ࣘ r, Problem Q cannot be implemented as written. Define the following estimated rate functions, in which we replace all unknown quantities with their corresponding estimators: .
We now define the estimated Problem Q, called Problem Q: where [ ] denotes the th order statistic, and α [ ] is the allocation corresponding to the th ordered estimated Pareto parent pair. We propose Algorithm 1 to iteratively estimate the Pareto set and update the optimal allocation scheme by solving Problem Q. The sequential allocation scheme in Algorithm 1 first takes an initial amount of sample δ 0 from each parent pair and initializes all of the estimators. Then the estimated optimal allocation problem, Problem Q, is solved. The resulting estimated optimal allocation is used as a sampling distribution from which to choose the index of the next parent pair to sample, denoted by the random variable X i in the algorithm. The vector ensures that each parent pair is sampled infinitely often and should be small relative to 1/r. Once the sample δ has been expended, Algorithm 1 Sequential algorithm for implementation of the optimal allocation scheme in the mating design problem with normal trait distributions. Require: Number of pilot samples δ 0 > 0; number of samples between allocation vector updates δ > 0; and a minimum-sample vector = { 1 , . . . , r } > 0. 1: Initialize: collect δ 0 samples from each parent pair i ≤ r. 2: Set n = rδ 0 , n i = δ 0 . {Initialize total simulation effort and effort for each parent pair.} 3: Update the sample means Y i and sample variances σ 2 i for all parent pairs i ≤ r. Update the estimated Pareto set P. 4: Solve Problem Q to obtain the estimated optimal allocationα * (n) = (α 1 (n), . . . ,α r (n)).

11:
Update n i = n i + 1 for all i ∈ I n . Set δ + = |I n |. 12: end if 13: Set n = n + δ + and go to Step 3. all estimators are updated and the process is repeated. Algorithm 1 is nonterminating; however, the user may wish to terminate the algorithm after some total budget has been expended or some overall amount of computing time has passed. We present Example 4 to demonstrate the sequential algorithm.
Example 4: Let the true means and standard deviations for 100 parent pairs be as specified in Example 3, with the optimal allocation shown in Fig. 3b. We implement Algorithm 1 by assuming that the (μ i , σ i ) pairs in Fig. 3a are true and sequentially generating children from the appropriate normal distributions. The parameters for Algorithm 1 are δ 0 = 100, δ = 2500, and ϵ i = 10 −10 for all i ࣘ r. A description of the solver used to find the estimated optimal allocation in Step 4 of Algorithm 1 is given in Appendix D.1 of the online supplement. Sample quantiles of the optimality gap of the rate of decay of P{MC} for Algorithm 1, z(α * ) − z(ᾱ n ), calculated across 100 sample paths, are shown in Fig. 4. The optimality gap for equal allocation is also shown. We also implement MOCBA on this problem. To use MOCBA, we approximate the distribution of the sample variance as where n i is the sample received by parent pair i and N( ·, ·) indicates a normal distribution with mean and variance parameters. Therefore, if we were to observe i.i.d. copies of a random variable "σ 2 i " from which we could construct a sample mean, we would assume that where we scale the distribution of σ 2 i by n i . We set the MOCBA parameters as N 0 = 100 (initial sample size), = 2500 (increased sample per iteration), and δ = 1250 (the maximum number of samples a parent pair can receive in a given iteration; we set δ = /2 as in the numerical section of Lee et al. (2010)). Lee et al. (2010) also suggested that the allocation calculations be simplified by using the α i from the previous iteration. Since it is possible that α i = 0 for some i ࣘ r in the previous iteration, resulting in operations that divide by zero, we use the fraction of sample allocated to system i so far,ᾱ i,n = n i /n. Sample quantiles of the optimality gap of the rate of decay of P{MC} for MOCBA, z(α * ) − z(ᾱ MOCBA,n ), calculated across 100 sample paths, are also shown in Fig. 4.
As expected, in Fig. 4, the optimality gap for the implementation of Algorithm 1 reduces as the sample size increases. The optimality gap for MOCBA appears to have a larger initial variance than than that of the proposed procedure. For larger sample sizes, MOCBA appears to be a competitive alternative to equal allocation.
Note that in Algorithm 1, the choice of δ 0 and δ will affect how quickly the optimality gap converges to zero. Methods for choosing δ 0 and δ are future research; we suspect that the rates of convergence of the rate functions G(·) and H(·) to their true values G( · ) and H( · ) may provide guidance in this respect. We also speculate that, due to autocorrelation inᾱ n , it may be better to err on the side of "larger" δ 0 so that initial poorer estimates of the optimal allocation do not result in a sample path that initially performs worse than equal allocation.

Solving the estimated max breeding problem in the IP step
Having estimated the Pareto set, the breeder may wish to create the mating design by solving the max breeding problem on the estimated Pareto set. In this section, we develop a branch-andbound algorithm for constructing the optimal mating design, given the estimated Pareto set. Thus, we consider Problem MB : We wish to find the optimal mating designb * (n) that solves this modified max breeding problem, which is a nonlinear integer program. As the simulation budget n tends to infinity, assuming that we sample from each parent pair infinitely often (as in Algorithm 1), the solution to Problem MB will tend to the solution of Problem MB, b * . Henceforth, we suppress the explicit notation that the solution to Problem MB depends on the simulation budget n.

Lower and upper bounds
To prune the search space in the branch-and-bound algorithm, we need lower and upper bounds on the performance of designs encountered during enumeration. The max breeding problem is a maximization problem, so any feasible solution produces a valid lower bound on the optimal objective value. We construct an initial feasible solution by finding the mate pair producing the largest objective value when receiving the entire breeding budget b. Thus, we compute for each i ∈ P. We then define LB := max i∈ P LB i .
At a given node in the search tree, we test whether further branching can produce a solution that exceeds the best solution so far. We propose an upper bound for the performance of designs encountered through further branching in the search space. Let Z + denote the set of non-negative integers and let B := {b ∈ Z | P| + : | P| i=1 b i ≤ b} be the set of feasible mating designs that breed from the parent pairs in the estimated Pareto set. Given a set of ordered pairs F = {(i 1 , k 1 ), (i 2 , k 2 ), . . .} where i 1 , i 2 , . . . are parent pairs in P and k 1 , k 2 , . . . are the fixed breeding budget variables at the current node of the search tree, define B(F ) := {b ∈ B : b i = k for all (i, k) ∈ F} as the set containing all mating designs that can be encountered through further branching. To prune the search space, we need an upper bound on the performance of designs in B(F ). Let L(F ) = { j ∈ P : j = i for all (i, k) ∈ F} be the set of indices of free variables-that is, variables not fixed by F-and let d be the breeding budget remaining after fixing the elements of F. Suppose L(F ) = ∅. Consider The following proposition states that the expected maximum achieved through further branching is less than or equal to the upper bound.

Proposition 3. Let E[M(b)] denote the objective value of Problem MB as a function of b. Then E[M(b)] ≤ U B for all b ∈ B(F ).
Proof. Let b ∈ B(F ). By analysis similar to that in equation (B.2) (see Appendix B of the online supplement): By construction, Thus, the quantity in line (2) is positive.

Branch-and-bound algorithm
We present a branch-and-bound algorithm, Algorithm 2, that uses the bounds derived in Section 4.1. To initialize the algorithm, we first evaluate LB i for all i ∈ P, sort the parent pairs by descending order of LB i , and store the list as L. The set L is an ordered set of mate pairs whose allocation is not fixed. Then an initial feasible solution and lower bound is found by taking LB = max i∈ P LB i , as discussed in Section 4.1. The variable b LB stores the breeding budget allocation that produces the bound LB; that is, the best feasible breeding budget found so far. Then the implicit enumeration function is called.
Algorithm 2 Branch and bound algorithm to solve the estimated max breeding problem. 1: Initialize: Evaluate LB i for all i ∈ P. Sort parent pairs by descending LB i and store list as L, where ties are broken arbitrarily. 2: Initialize global variables: lower bound LB, the current best solution b LB as the breeding budget that produces LB.

Implementation
To assess the optimality gap of our methods, we require a set of test problems to which we know the solutions-both to the true max breeding problem (1), as well the true optimal allocation problem, Problem Q. We use simulation to obtain realistic populations for which we know the mean and variance parameters for all of the parent pairs and that span a wide range of quantitative trait models. Specifically, we first randomly place loci throughout the genome and sample from an empirically derived distribution of allelic effects (see Buckler et al. (2009)). Then, using a standard model of meiosis, we simulate progeny to estimate the population mean and variance, which we assume to be the fixed "true" population values. (See Appendix A of the online supplement for additional simulation details.) Then, we conduct our experiments by generating i.i.d. normal random variates from the "true" trait distributions of the parent pairs to simulate the progeny.
In what follows, we fix 98 randomly generated populations of r = 100 parent pairs each. The population size r = 100 was chosen as the largest number of parent pairs for which we could reliably run numerical experiments. Some plant breeding applications will require techniques for larger populations; however, this size is large enough to be useful (Simmonds, 1979). Appendix D.2 of the online supplement shows the 98 populations and their proposed allocations.
Ideally, we would always numerically solve the allocation Problem Q to optimality. However, some of the randomly generated populations resulted in an ill-conditioned Problem Q, likely resulting from parent pairs on or near the Pareto front lying too close to each other (such that, for example, the rate functions become shallow and the solver has difficulty estimating a valid Hessian). Rather than eliminate such populations for being illconditioned, which a practitioner would not know in advance, we implemented a series of heuristics to find better solutions to these difficult problems. All numerical experiments presented in this section use the solver outlined in Appendix D.2; thus, in Step 4 of the sequential Algorithm 1, the estimated optimal allocation is the best solution returned by the solver using the techniques in Appendix D.2.

Performance of the sequential algorithm to estimate the Pareto set
To find the estimated Pareto set using the sequential Algorithm 1 for each of the 98 parent pair populations, we fixed the algorithm parameters δ 0 = 100, δ = 2500, and ϵ i = 10 −10 for all i ࣘ r and ran one sample path for each population. Every 10 000 samples, the percentage of parentage pairs misclassified, percentage of Pareto parent pairs falsely excluded, and percentage of non-Pareto parent pairs falsely included were recorded. Figure 5 shows the average of each of these quantities, taken across the 98 populations, as a function of the simulation budget n. Note that reported average percentages are correlated across the simulation budget values. For comparison, we ran MOCBA and equal allocation in a similar fashion. Parameters for MOCBA were N 0 = 100, = 2500, and δ = 1250 (as in Example 4). From Fig. 5a, when considering the metric of the overall percent of parent pairs misclassified, the proposed algorithm performs better than MOCBA, which in turn performs better than equal allocation. However interestingly, in Figs. 5b and 5c, whereas the proposed allocation performs better than equal allocation, MOCBA appears to perform similar to the proposed allocation in Fig. 5b corresponding to false exclusions and similar to equal allocation in Fig. 5c corresponding to false inclusions. This behavior occurs because, unlike the proposed algorithm, MOCBA does not allocate in a way that simultaneously controls the probability of MCE and MCI events. Instead, MOCBA initially alternates between allocating to minimize bounds on the probability of MCE and allocating to minimize bounds on the probability of MCI, depending on which estimated bound is larger. At any one iteration, MOCBA allocates to minimize the largest estimated bound. Since the estimated values of the bounds will converge to their true values, in the limit MOCBA allocates to minimize one bound. For the parent pair populations we consider, the estimated bound for the probability of MCE is routinely larger than the estimated bound for the probability of MCI, which causes the MOCBA algorithm to allocate by minimizing the bound on the probability of MCE.
We note that MOCBA is a "closed form" allocation scheme, requiring very little time to implement, whereas the proposed allocation scheme requires solving a convex optimization problem at each iteration in the sequential algorithm. Thus, the tradeoff of computation time required to solve for the estimated optimal allocation versus computation time required to simulate a child may affect a practitioner's choice between the proposed allocation and MOCBA. The benefit of using the proposed allocation scheme may increase as the computation time required to produce a simulated child grows. Further research is required to derive a large-population approximation to the proposed allocation scheme like the one for constrained SO in Pasupathy et al. (2014). Table 3 reports how quickly the proposed branch-and-bound algorithm finds the optimal solution to Problem MB for the 98 populations described in Appendix D.2.

Performance of the branch-and-bound algorithm
From Table 3, the average number of branch and bound nodes (BBNs) visited by Algorithm 2 generally increases as b increases. However, Table 3 also shows that, as b increases, the percentage of populations with BBNs larger than one seems to decrease, whereas the average number of BBNs visited given that the number of BBNs is greater than one increases. This result is expected: as b tends to infinity, giving all of the breeding budget to the parent pair with the largest variance becomes optimal. However, for populations where the "all to one" strategy is not immediately declared optimal, large b increases the number of feasible solutions, making the optimal solution harder to find.

Performance of two-step procedure
Throughout this article we have proposed a two-step procedure to solve the mating design problem, consisting of a simulation step and an IP step. In Fig. 6, we evaluate this procedure under different simulation step policies by presenting the optimality gap of the solution to Problem MB as a function of sample size for various breeding budgets. Figure 6 required solving 7056 branch-and-bound problems (in addition to the 588 solved in Table 3), of which eight (0.1%) required significant computational time. Seven are still running at the time of writing. These problems correspond to runs for populations 11, 21, 25, 49, and 73, which are excluded from the figures in this section. Future research may result in good heuristics for these difficult-to-solve  problems. The simulation sample paths for the IP step in Fig. 6 are the same as in Fig. 5. Although equal allocation initially performs worst at simulation budget n = 20 000, perhaps surprisingly, equal allocation eventually becomes a competitive algorithm in Fig. 6. For smaller b, equal allocation becomes competitive by simulation budget n = 100 000. This result emphasizes the fact that our objective in proposing a simulation step that maximizes the rate of decay of the P{MC} is to classify the Pareto parent pairs correctly, which happens at an exponential rate, whereas the IP step requires correct estimation of the means and variances of parent pairs in the Pareto set, which happens at rate O(1/ √ n i ) for parent pair i receiving n i samples (see also Glynn and Juneja (2004)). Easily classified parent pairs may receive insufficient simulation budget to meet the precision requirements of the IP step. Although the proposed algorithm consistently performs well, one may imagine further splitting the simulation step into two: a first step for Pareto set classification and a second step to ensure that the means and variances of the Pareto set are estimated with high precision. This procedure will also save computational resources, as the estimation step can be conducted with naive allocations on only the estimated Pareto set. In practice, the estimate of the expected maximum that breeding decision-makers will have is the objective value of Problem  )] holds (see Mak et al. (1999)). The proposed allocation often yields better predictions than MOCBA and equal allocation, particularly for smaller sample sizes. As in Fig. 6, the predictive value will be better for more accurate mean and variance estimates.
The results in this section suggest that the use of the proposed algorithm can provide substantial gains in the performance of the max breeding problem, especially for smaller sample sizes.

Concluding remarks
This article provides an example of the collaboration opportunities available between the plant breeding and industrial engineering communities. Fruitful collaborations provide an avenue for advances and insights in both domains. Furthermore, our article contributes to the literature as follows.
1. We introduce the breeding objective of maximizing the expected trait value of the single best individual in the progeny population, a beneficial objective for plant breeders, and we provide implementable methods to design a breeding allocation based on this objective. 2. We show that when the trait distributions are normal, only parent pairs with non-dominated means and variances will receive non-zero sample at optimality in the max breeding problem. Thus, we make the max breeding problem tractable by exploiting its structure to reduce the search space. 3. We provide an implementable sequential sampling framework based on the asymptotically optimal sample allocation that maximizes the rate of decay of the probability of misclassification. Through numerical experiments, we show that this framework outperforms the current standard allocation scheme, MOCBA, in our application. 4. We provide the first known application to successfully implement an LD-based sequential sampling scheme with non-normal rate functions. We also re-emphasize the following practical insight from this work. A broad class of popular efficient sampling methods, including OCBA-like methods (Chen et al., 2000;Lee et al., 2010;Lee et al., 2012) and LD-based methods (Glynn and Juneja, 2004;Szechtman and Yücesan, 2008;Hunter and Pasupathy, 2013;Pasupathy et al., 2014), capitalize on the fact that the probability of misclassification decays to zero at an exponential rate as sample size increases. However, practitioners may also wish to accurately estimate the performance measure of interest for the best systems, which happens at a slower rate. Thus, one might initially simulate using an OCBA-like or LDbased method to quickly classify systems but then employ naïve sampling allocations on the estimated-best systems to ensure an accurate solution to a second optimization step.
Finally, we observe that the branch-and-bound algorithm converges extremely fast in practice and allows the user to assert optimality of the resulting mating design, subject to estimation errors. Extensions of this work in plant breeding include developing analogous methods that (i) exploit a population structure based on pedigree and (ii) consider multiple traits; for example, maximizing expected yield subject to fruit weight remaining in a desired range.