Optimal Portfolio Strategies for New Product Development

We study the portfolio selection problem in a new product development setting with many projects in parallel, each lasting several stages, in the face of uncertainty. Each stage of the process performs an experiment on a selected number of projects in the stage, depending on the amount of (scarce) budget allocated to the stage. Projects become differentiated through their experimental results, and all available results for a project determine its category. We model the problem as a Markov decision process. We seek an optimal policy that specifies, for every configuration of projects in categories, which projects to test and/or terminate. For two special cases we characterize the optimal project promotion policy as following a new type of strategy, state-dependent non-congestive promotion (SDNCP). SDNCP implies that a project with the highest expected reward in any stage is advanced to the next stage if and only if the number of projects in each successor category is below a congestion-dependent threshold. For the general problem, numerical experiments reveal the outstanding performance of SDNCP (optimal in 72 of 77 instances with maximum deviation from optimal of 0.67%), highlighting when and how a fixed non-congestive promotion policy, which is easier to implement, may fall short.


Introduction
Developing new products and swiftly launching them successfully into the market have become critical in today's competitive environment. For this reason firms often pursue many new product development (NPD) projects in parallel (e.g., Ding and Eliashberg 2002, Ulrich and Eppinger 2003, and Girotra et al. 2007. But this can be a very challenging task from the operations management perspective: Projects compete for access to scarce resources in general, in the absence of precise knowledge regarding their outcomes (Kavadias and Chao 2008). Only through experimentation can uncertainty be reduced in many settings: A typical NPD process consists of a series of distinct experimental stages allowing information generated in each stage to be incorporated into portfolio decisions in later stages (Cooper 2008 andArtmann 2009). Firms terminate product candidates with little promise before testing them in expensive downstream stages, focusing available resources on much stronger candidates (Thomke 2008).
Information gathered as a project progresses through experimental stages is typically imperfect, with potentially varying levels of accuracy across stages (e.g., Gino and Pisano 2005). In addition, projects may face uncertainty about not only their outcomes but also their experimentation times (e.g., Loch andTerwiesch 1999, andSommer et al. 2008). Another significant complication is that each stage may require a unique specialized resource, may have a fixed budget, or may share a fixed budget with other stages (e.g., Blau et al. 2004). Taken together, these elements require concurrent capacitated resource allocation decisions, across stages, under uncertainty.
Improperly managing such constrained resources may lead to significant delays in project completion times, reducing the firm's profitability (Adler et al. 1995). One practical solution to prevent such delays is to limit the number of projects in each stage (Thomke and Reinertsen 2012), but it is not clear how to best implement such a policy or how effective it would be. Part of the problem is that, to our knowledge, the NPD portfolio selection literature has not yet developed a comprehensive modeling framework that explicitly captures all these aspects of the problem. Our paper is the first attempt to fill this gap: By employing the theory of Markov decision processes (MDPs), we study the NPD portfolio selection problem under imperfect information across stages, random experimentation times, and stage-dependent resources.
Dynamic portfolio management has received much attention in the NPD literature; we refer the reader to Kavadias and Loch (2004) and Kavadias and Chao (2008) for comprehensive reviews.
Many authors have viewed the project prioritization problem as a multi-armed bandit (MAB) problem in which projects compete for a critical resource that can be utilized by only one project at a time. A project utilizing the resource undergoes Markovian transitions and returns an immediate state-and time-dependent reward. Gittins and Jones (1972) introduced the Gittins Index, a number that can be assigned to each project at any time that corresponds to the reward that would make the controller indifferent between working on the project and terminating it in exchange for that reward: It is always optimal to work on the project with the highest Gittins index. See Gittins (1979,1989), Whittle (1980Whittle ( , 1981Whittle ( , 1988, Ross (1982), Weber (1992), Banks and Sundaram (1994), Bertsimas and Niño-Mora (1996), Kavadias andLoch (2003), andBertsekas (2007) for further analysis of the MAB problem. In variants of the MAB problem, multiple resources are dynamically allocated to several projects, new projects may arrive, all projects may change state after utilization of a resource, and a common fixed changeover cost and/or delays may be incurred by reallocation of resources. But, unlike the MAB problem, in our study projects use different (scarce) resources in different stages of the NPD process, leading to concurrent resource allocation decisions across stages.
The project initiation and prioritization problems can also be viewed through the lens of queueing theory: The problem of when to initiate a project by testing a new product idea is similar to the admission control problem in which different customers (or projects) may bring different rewards if accepted. As the number of customers in the system nears the capacity, the controller tends to reject customers with smaller rewards in anticipation of future customers with larger rewards. See Miller (1969), Lippman and Ross (1971), Stidham (1985), Haviv and Puterman (1998), Lewis et al. (1999), and Lewis and Puterman (2000) for analysis of queues with admission control. Similarly, the project prioritization problem can be modeled as a multiclass queue with stochastic completion times for each class of job (or project). The cµ rule is optimal for linear delay costs: It is always optimal to give priority to the job with the highest delay cost divided by the expected processing time. For non-linear delay costs, the generalized cµ rule is asymptotically optimal in heavy traffic and in due date scheduling. See Harrison (1975), Wein (1992), Ha (1997a), van Mieghem (1995van Mieghem ( , 2000, and Mandelbaum and Stolyar (2004) for analysis of multiclass queues. Unlike the above papers, our study takes into account the evolution of projects through experimental results in project promotion, allowing projects to be delayed or even terminated based on this evolution.
Another stream of research has studied the dynamic and stochastic knapsack problem: Each project arrives in time as a stochastic process and has a demand for a limited resource. The demands and their rewards are random, and become known upon arrival. If a demand is accepted, the reward is received, otherwise a penalty is paid. The objective is to maximize the expected reward in the finite or infinite horizon. See Papastavrou et al. (1996), and Kleywegt andPapastavrou (1998, 2001). These papers again fail to incorporate intermediate project reviews into decision-making.
Several other authors have incorporated learning into the NPD process, allowing for information updating: Roberts and Weitzman (1981) consider a single project that must pass through a number of experimental stages, each incurring a random cost. The reward is received after all stages are complete. The state of the project is described by the number of stages from the end. The reward is a state-dependent random variable, which becomes less variable as the state decreases (and thus more information becomes available). The authors establish the optimal stopping rule when the state variable is continuous, the reward is normal, and the standard deviation of the reward in any stage is proportional to the expected cost of carrying the project through to completion from that stage. The authors consider neither a multi-project environment under scarce resources nor random experimentation times. Vishwanath (1992) considers a finite number of projects, each yielding an unknown reward at an uncertain time. Each project has its own independent joint probability distribution for the reward and the time to collect the reward (if selected). Several projects may be undertaken in parallel, and the projects may be selected sequentially in any order desired. The highest expected utility (assessed for each project in isolation) rule is the best order for parallel project selection to maximize the total expected discounted utility. Although Vishwanath (1992) allows for information acquisition as a selected project proceeds, resource scarcity is ignored.
Finally, Adler et al. (1995) model the NPD process as a queueing network: Resources are "workstations," each consisting of one or more identical servers in parallel. And projects are "jobs," each consisting of several activities to be performed by specified workstations in specified orders. The authors develop a simulation model for their queueing network under exponentially distributed activity times. Their simulation experiments, using the data from real projects in a real organization, indicate that restricting the total number of projects under way at any time may greatly reduce project completion times. But this comes at a cost: Reduced throughput. Adler et al. (1995) thus argue that limiting the number of projects in the network would be particularly attractive if projects can be selected according to their probabilities of success. (Projects are not dynamically prioritized during execution in their model.) In this paper we develop a mathematical abstraction that rigorously incorporates projects' probabilities of being in any state (based on the latest information) into project promotion decisions. This enables us to show that a congestion-dependent critical level on the number of projects of each type proves more effective than a fixed critical level on the total number of projects in the entire system. See also Blau et al. (2000), Blau et al. (2004), Repenning (2001), Subramanian (2003), and Verma et al. (2011) for simulation-based methods. Specifically, we model the portfolio selection problem as an infinite-horizon MDP under the total expected discounted reward criterion: An ample source of new product ideas is available. Each project can be initiated by testing a new product idea, and undergoes a different experiment in each stage of the NPD process. Experiments generate signals about the true nature of a projectthe quality of these signals can differ across stages. Beliefs about the true nature of a project are updated after each observed signal according to a Bayesian rule. Projects thus become differentiated through their signals; all available signals for a project determine its category. Experimentation times in any stage are independent and exponentially distributed with finite rate, due to constrained resources. Each experiment incurs a fixed cost upon completion; both experimentation rates and costs can also differ across stages. Returns of a project are earned only after the resulting product is launched into the market. But some projects with little promise may not pay off due to high experimentation and/or delay costs; thus projects in any stage may also be terminated at no cost.
The state of the above system consists of the numbers of projects in each category. Given the system state, a control policy specifies whether or not to test a project in each stage, and which project to select if a project is to be tested (i.e., project promotion); and whether or not to terminate a project, and which project to select if a project is to be terminated (i.e., project termination).
We characterize the structure of the optimal stationary promotion policy via a new type of policy, state-dependent non-congestive promotion (SDNCP) in two special cases of the general problem: (a) when there is a single experimental stage and projects are never terminated (Theorem 1), or (b) when there are multiple mandatory stages that do not provide information (Theorem 2). We also prove that projects are never terminated at optimality in case (b). An SDNCP policy implies that, in each stage, it is optimal to advance a project with the highest expected reward to the next stage if and only if the number of projects in each successor category is less than a congestion-dependent threshold. Thus such a policy is "forward looking," i.e., taking into account downstream congestion.
We show that threshold values weakly decrease as a later stage becomes more congested or as an earlier stage becomes less congested. Thus the SDNCP policy is also "backward looking," i.e., considering upstream congestion. Importantly, a stage becomes more congested with an increase in the number of projects, but also in the expected reward of any project, in the stage.
We adapt the SDNCP policy to our general model, evaluating its use as a heuristic project promotion policy. We compare the SDNCP policy to two other heuristics: a fixed non-congestive promotion (FNCP) policy with fixed thresholds across stages, and a continuous promotion (CP) policy. The FNCP policy limits the number of active projects in each stage, which is one of those proposed solutions in Thomke and Reinertsen (2012). The CP policy initiates a project only when another project has been completed or terminated; such a policy is equivalent to the "pull" strategy in Adler et al. (1995).
Because our optimal structural results provide no insight into project termination decisions in the general problem, we do not impose any policy structure for project termination in our heuristics.
Taking the average reward rate as our optimization criterion, we thus formulate linear programs to find the globally optimal policy and the optimal CP policy, and mixed integer programs to find the optimal SDNCP and FNCP policies. (The optimal CP policy always promotes projects if feasible, which leads to the linear program.) As we impose a deterministic policy structure for project promotion in each of our heuristics, the optimal SDNCP, FNCP, and CP policies can potentially yield fewer randomized decisions, in comparison with the globally optimal policy, making their implementation relatively simple.
We analytically show that SDNCP outperforms FNCP and CP with respect to objective value (Proposition 1). We then generate 77 instances of the general NPD problem: Remarkably, we find that SDNCP yields the globally optimal reward in 72 of these instances and its maximum distance from optimal reward is only 0.67% in the other 5 instances. We also find that SDNCP performs better than FNCP and CP by up to 7.98% and 64.61% of the globally optimal reward, respectively, on the same test bed. The average distances of FNCP and CP from optimal reward are 1.62% and 6.74%, respectively. Numerical results indicate that SDNCP has the greatest benefit over FNCP and CP (i) when downstream stages are slower than upstream stages, (ii) when project holding and/or experimentation costs are higher, or (iii) when experimental results are less accurate. However, the computation times of FNCP and CP are several orders of magnitude lower than those of SDNCP.
Thus the controller might prefer to use FNCP if none of conditions (i)-(iii) holds.
We contribute to the NPD literature in several important ways: First, we develop a novel MDP formulation for the portfolio selection problem: Managing scarce resources effectively under uncertainty is rather difficult in an NPD process. Our MDP model enables a clean analytical formulation to handle such complexity, allowing us to generate structural insights into optimal policies. Second, we show that optimal project promotion decisions are congestion-dependent, and that "congestion" is driven by not only the number of projects but also the breakdown of projects by stage and expected reward. Last, we prove that common heuristics limiting the number of projects in the entire system or in each stage are inferior to our SDNCP policy in the general problem. Numerical experiments reveal when FNCP and CP significantly deviate from the optimal policy, producing high-level guidelines for control policy choices in different environments.
The rest of the paper is organized as follows: Section 2 formulates our general NPD model under the discounted reward criterion. Section 3 establishes the optimal policy structure when there is a single experimental stage and projects are never terminated. Section 4 establishes the optimal policy structure when there are multiple non-experimental stages. Section 5 presents our heuristic policies for the general model in Section 2. Section 6 presents our numerical results for the heuristics under the average reward criterion. Section 7 offers a summary and concludes. Proofs of all analytical results (except Proposition 1) are contained in an online appendix.

Problem Formulation
We consider the problem of project selection and resource allocation in a continuous-time NPD process (e.g., a new drug development process, cf. Figure 1). Each NPD project passes through a finite number of experimental stages (e.g., safety, efficacy, and general tests; see DiMasi et al. 2003 and Girotra et al. 2007) before the resulting product (e.g., "Crestor") is placed on the market.
Define M = {1, 2, .., m} as the set of experimental stages, and i as the index for the stage (e.g., m = 3 in Figure 1). The true ultimate nature of a project falls into one of a number of states (e.g., "success" or "failure"), and initial expectations about the nature are the same across all projects.
Each experiment generates a piece of new information (e.g., "good" or "bad" signal) about the nature of the project; uncertainty pertaining to the ultimate outcome of the project is further resolved in each stage. Define K as the number of possible signals that can be generated in each stage for each project, and k as the index for the signal (e.g., K = 2). There exists a one-to-one correspondence between the set of signals in each stage and the set of states for the true nature.
Both sets consist of integers from 1 to K such that a lower integer indicates a project with higher return (e.g., k = 1 means a "good" signal that corresponds to a "success", and k = 2 means a "bad" signal that corresponds to a "failure").
All available signals for a project determine its category; projects become differentiated through their categories. Define N = {0, 1, .., n} as the set of project categories, and j as the index for the category. Note that K i is the number of categories in stage i, and n = K + K 2 + ... + K m (e.g., n = 14 in Figure 1). Different stages of the NPD process utilize different resources (e.g., specialized testing equipment or specialists with unique areas of expertise), and all these resources are limited. Define W i as the set of project categories waiting for access to resources of stage i for experimentation (e.g., W 1 = {0}, W 2 = {1, 2}, and W 3 = {3, 4, 5, 6}). Also, define W m+1 as the set of project categories that have completed all stages except the product launch stage m + 1 (e.g., Experimental results imperfectly reveal the true nature of a project throughout the NPD process. Define Φ (i) as an K × K conditional probability matrix in stage i, with the rows and the columns indicating the states of the true nature and the signals of the test, respectively: φ A new drug development process. Each node represents a different project category, and each arc in a given stage represents a different signal.
the probability that the experiment in stage i generates signal k for projects with true nature k.
Note that if φ (i) k,k = 1, ∀k, then the experiment in stage i perfectly reveals the true nature. Thus we assume φ (i) k,k < 1, ∀i, k. Also, note that we may have φ (i) k,k = 0 for some nature k and some signal k in some stage i. Thus the total number of signals that can be actually observed can differ across states of the true nature and stages. Beliefs about the true nature of a project undergo Bayesian updating after each experiment. Define p j = (p j,1 , .., p j,K ) as the probability distribution for the true nature of a project in category j ∈ W i . Suppose that a project in category j ∈ W i becomes category j ∈ W i+1 , returning signal k in stage i. Then the posterior probability distribution for the true nature of the project p j = (p j ,1 , .., p j ,K ) is calculated as where f j→j is the probability that a project in category j ∈ W i falls into category j ∈ W i+1 , returning signal k in stage i, i.e., The state of the system at time t is the vector X(t) = (X 1 (t), .., X n (t)), where X j (t) is a nonnegative integer denoting the number of projects in category j at time t. Projects held in the NPD process incur a holding cost per unit time which is convex and strictly increasing in the total number of projects. Denote by h(X(t)) = h( j X j (t)) the holding cost rate at state X(t). Also, denote by r k the reward that can be obtained from the launch of a new product with ultimate outcome k.
The expected reward for a project in category j is calculated by ρ j = r · p j , where r = (r 1 , .., r K ) is a K dimensional nonnegative vector whose elements are in descending order. Thus ρ j ≥ 0, ∀j. The returns of a project are earned only after the project is complete and the resulting new product is launched.
The scarce resources in any stage can be utilized by at most one project at any time. This assumption is benign when there is diseconomies of scope in resource sharing. Both experimentation and product launch times are independent and exponentially distributed. Define µ i as the experimentation rate in stage i, and µ m+1 as the product launch rate. The system incurs an experimentation cost c i upon completion of an experiment, but there are no costs associated with interrupted experiments. This assumption is not restrictive in the two special cases of the model introduced in Sections 3 and 4, as it is never optimal to interrupt any experiment once it has been initiated in those cases. Projects may also be terminated, only one at a time, at no cost.
Termination time for a project is again exponentially distributed with finite mean 1/λ (which may be arbitrarily small). Although ρ j ≥ 0, ∀j, note that it might be desirable to terminate a project due to high experimentation and holding costs, outweighing its expected reward.
Since all inter-event times are exponentially distributed, the system retains no memory, and decision epochs can be restricted to times when the state changes. Using the memoryless property, we can formulate the problem as an MDP and focus on Markovian policies for which actions at each decision epoch depend solely on the current state. A control policy specifies for each state x = (x 1 , .., x n ), the action u (x) = (y 0 , .., y n , z), y j ∈ {0, 1}, ∀j, and z ∈ {0, 1, .., n}, where y j = 1 means advance a project from category j to the next stage, y j = 0 means do not advance a project from category j, z > 0 denotes the category from which a project is terminated, and z = 0 means do not terminate any project. Denote by U(x) the set of admissible actions in state x. The action u (x) = (y 0 , .., y n , z) ∈ U(x) must satisfy the following conditions: • A project in any category can be tested or terminated only if at least one project exists in this category, i.e., y j = 0 and z = j if x j = 0, ∀j > 0.
• At most one project can be tested at any time in any stage, i.e., j∈W i y j ≤ 1, ∀i.
• At most one new product can be introduced at any time, i.e., j∈W m+1 y j ≤ 1.
• A project cannot be terminated if it is to be tested, i.e., z = j if y j = 1 and x j = 1, ∀j > 0.
• A project cannot be tested if it is to be terminated, i.e., y j = 0 if z = j and x j = 1, ∀j > 0.
Let v denote a real-valued function defined on N n 0 (N 0 is the set of nonnegative integers and N n 0 is its n-dimensional cross product). Also define 0 < α < 1 as the discount parameter. For a given policy = and a starting state X(0) = x, the expected discounted reward over an infinite planning horizon, v (x), can be written as where N j (t) is the cumulative number of product launches from projects in category j ∈ W m+1 at time t and N (i) (t) is the cumulative number of experiments performed in stage i up to time t.
Define νx |x,u as the rate at which the system moves from state x to statex if action u = (y 0 , .., y n , z) ∈ U(x) is chosen in state x. The time between the transition to state x and the transition to the next state is exponentially distributed with rate ν ·|x,u = x νx |x,u if action u = (y 0 , .., y n , z) ∈ U(x) is chosen in state x. Define t k as the time of occurrence of the kth transition.
Also let t 0 = 0. The state of the system stays constant between transitions, i.e., Following Lippman (1975), we consider a uniformized version of the problem where the rate of transition ν is an upper bound for all states and controls, i.e., ν ≥ ν ·|x,u , ∀x, u. Specifically, we will formulate the problem for the choice ν = λ + i µ i + µ m+1 .
Thus the kth transition time interval (t k+1 − t k ) is exponentially distributed with rate ν, ∀k. With the uniform transition rate, we are able to transform the continuous-time control problem into an equivalent discrete-time control problem.
If action u = (y 0 , .., y n , z) ∈ U(x) is selected in state x, the next state isx with probability P x,x (u).

Thus
: where e j is the jth unit vector of dimension n. In this discrete-time framework, N j (t k ) is the cumulative number of product launches from projects in category j ∈ W m+1 at the time of the kth transition, N (i) (t k ) is the cumulative number of experiments performed in stage i at the time of the kth transition, and h(X(t k )) is the holding cost rate during the time interval [t k , t k+1 ). Then, v (x) in (1) can be rewritten as follows. v Our objective is to identify a policy * that maximizes the expected discounted reward. We below formulate the optimality equation that holds for the optimal reward function v * = v * (see Section EC.1 of the online appendix for a discussion of the existence of the optimality equation): Therefore, our continuous-time control problem is equivalent to a discrete-time control problem with discount factor ν/(α + ν) and reward per stage given by As it is always possible to redefine the time scale, without loss of generality we assume α + ν = 1.
Letting e 0 denote a zero vector of dimension n, the optimality equation in (3) can be simplified as where the operator T i for project promotion decisions in stage i is defined as and the operator T m+1 for product launch decisions in stage m + 1 is defined as For a given state x, the operator T i , i ≤ m, specifies whether or not to test a project in stage i, and which project to select if a project is to be tested, and the operator T m+1 specifies whether or not to launch a new product, and which project to select if a new product is to be launched.

The Case with a Single Experimental Stage
In this section we assume that the NPD process consists of (i) an experimental stage that generates one out of the K signals about the true nature of each project, and (ii) a product launch stage.
We also assume that projects are never terminated. Thus: Assumption 1. m = 1 and z = 0.
Assumption 1 implies that the number of project categories equals the number of signals that can be observed, i.e., n = K. Let c denote the experimentation cost in stage 1. Under Assumption 1, setting the uniform transition rate equal to µ 1 + µ 2 , the optimality equation in (4) can be written as follows: where the operators T 1 and T 2 are defined as The operator T 1 specifies when to test a new product idea. The operator T 2 specifies when to launch a new product, and which project to select when a new product is to be launched.
The lemma below shows that the optimal reward function in (5) satisfies Properties 1-7: Furthermore, the optimal reward function v * is an element of V .
We now consider the implications of Properties 1-7: Property 1 states that the optimal reward function weakly increases as the number of projects with higher expected reward increases, keeping the total number of projects fixed. Property 2 shows that a new product launch is always beneficial no matter what category is chosen, as long as it is feasible (recall that ρ j ≥ 0, ∀j). However, Property 3 says that choosing the category with higher expected reward is more desirable for product launch. Property 4 states that the incentive to replace a project in category l with one having higher expected reward weakly increases as the number of projects with expected rewards lower than ρ l increases. Property 5 shows that the incentive to replace a project in category l with one having higher expected reward weakly increases when a project with expected reward less than ρ l is replaced with one having higher expected reward (but lower than ρ l ). Property 6 says that, when l = 1, the desirability of testing a new product idea weakly decreases as the number of projects in any category increases. Finally, Property 7 implies that, when l = 1, the desirability of testing a new product idea weakly decreases as the number of projects with higher expected reward increases, keeping the total number of projects fixed.
The intuition behind Properties 1-3 is straightforward: It is more desirable to have a project with higher expected reward, which will conceivably take priority over those projects with lower expected rewards in resource utilization for product launch. More critically, Properties 4-7 enable us to uncover the role congestion plays in promotion decisions for new product ideas: • As the number of projects in category l increases, the system becomes more congested from the perspective of a new product idea: If a new product idea is tested in the experimental stage, it might become a project with expected reward less than ρ l , taking a lower priority than all projects in category l in the queue for the second stage. Such a project waits longer for access to resources of the second stage when there are more projects in category l. Consequently, since any delay in the project completion time is costly, it is less desirable to test a new product idea (Property 6).
• Likewise, when a project is replaced with one having higher expected reward, the system again becomes more congested from the same perspective: A new product idea, once tested, is more likely to see a greater number of high priority projects in the queue for the second stage if a low value project is replaced with a high value project. Thus it is less desirable to test a new product idea (Property 7).
• As the system becomes more congested due to an increase in the number of low value projects, the system anticipates a lower throughput rate in the experimental stage in the future (due to Property 6), and eventually a small number of high value projects. To hedge against future scarcity of high value projects, the controller would prefer to trade a project in category l with one having higher expected reward, and such a trade becomes more desirable as the number of projects with expected rewards less than ρ l increases (Property 4).
• Likewise, as the system becomes more congested due to a rise in the expected reward of a low value project, the controller would prefer to trade a project in category l with one having higher expected reward in anticipation of a small number of high value projects (due to Property 7). Such a trade becomes more desirable when a project with expected reward less than ρ l is replaced with one having higher expected reward (but less than ρ l ) (Property 5).
The structural properties of our optimal reward function allow us to establish the optimality of a new type of policy: state-dependent noncongestive-promotion (SDNCP) policy.
Theorem 1. Under Assumption 1, the optimal stationary project promotion policy is a statedependent noncongestive-promotion policy with state-dependent promote-up-to levels S * j (x −j ): It is optimal to test a new product idea if and only if ., x n ) is an n − 1 dimensional vector of the numbers of projects in categories k = j.
The optimal policy has the following additional properties: i. The optimal promote-up-to level S * j (x −j ) weakly decreases as the number of projects in category k = j increases, ∀j.
ii. The optimal promote-up-to level S * j (x −j ) weakly decreases as the expected reward of a project in category k = j increases, ∀j.
iii. It is always optimal to launch a new product if there are projects available for product launch.
iv. It is always optimal to choose a project with highest expected reward for product launch.
v. It is never optimal to interrupt any experiment.
Using Property 6, Theorem 1 establishes the optimality of a state-dependent non-congestive promotion policy. Such a policy protects the system against congestion, restricting the number of projects that can be held in each category. Points (i) and (ii) show that the promote-up-to levels weakly decrease as the system becomes more congested with an increase in the number of projects in any category (due to Property 6), or with an increase in the expected reward of any project in the system (due to Property 7). Points (iii) and (iv) state that it is always optimal to launch a new product if it is feasible (due to Property 2), and it is optimal to choose a project with highest expected reward for product launch (due to Property 3). Lastly, point (v) says it is never optimal to interrupt an experiment once it has been initiated.
To our knowledge, we are the first to characterize the optimal project promotion policy for a Markovian NPD process with stage-dependent resources. We also significantly extend the existing NPD literature by showing that optimal promotion decisions depend on both the number of projects and the breakdown of projects into categories if an imperfectly informative stage exists. Now suppose that the experimental stage is uninformative, implying that there is only one project category. Then the problem becomes similar to the single-item inventory model introduced by Ha (1997b). A non-congestive promotion policy remains optimal in this special case of our model: it is optimal to promote a new product idea if and only if the number of projects in the system is less than a fixed promote-up-to level. This concurs well with the optimal replenishment policy in Ha (1997b): it is optimal to order an item if and only if the inventory level is less than a fixed base-stock level. But what if there are multiple non-experimental stages? We consider that next.

The Case with Multiple Non-Experimental Stages
In this section we assume that the NPD process consists of multiple non-experimental stages that do not provide any information about the true nature of any project, but which the project must complete before launch. In the literature there are many examples of non-experimental stages. See, for instance, the queueing network in Adler et al. (1995) and the project scheduling problem in Brucker et al. (1999). We also assume that a project may be terminated even if it is in the process of being tested, and vice versa. (The latter assumption is benign since the optimal policy in this section never terminates a project, cf. Theorem 2.) Thus: Assumption 2. K = 1, and it is feasible to have z = j and y j = 1 when x j = 1.
Assumption 2 implies that the number of project categories equals the number of stages, i.e., n = m. Let ρ denote the expected reward for a new product. Under Assumption 2, the optimality equation in (4) can be written as follows: where the operator T 0 for project termination decisions is defined as the operator T i for project promotion decisions in stage i is defined as and the operator T m+1 for product launch decisions in stage m + 1 is defined as We proceed to characterize the structural properties of our optimal reward function in (6). We will use the indices l, q, and w for the stages; the alphabetical ordering l → q → w corresponds to a progress from earlier to later stages. Let e m+1 denote a zero vector of dimension n. We define V as the set of real-valued functions g on N n 0 that satisfy the following properties: , ∀x ∈ N n 0 , and ∀(q, w) s.t. q = w and 0 < q, w ≤ m + 1, The lemma below shows that the optimal reward function in (6) satisfies Properties 8-13: . Furthermore, the optimal reward function v * is an element of V .
We now consider the implications of Properties 8-13: Property 8 implies that the optimal reward function weakly increases as projects move from one stage to the next stage. Property 9 shows that a new product launch is always beneficial as long as it is feasible (recall that ρ ≥ 0). Property 10 says that the incentive to promote a project from one stage to the next stage weakly increases if any project in another stage gets promoted. Property 11 states that promoting a project from stage l − 1 to stage l is more desirable when a project in stage q ≥ l is replaced with another project in stage w > q. Furthermore, Property 12 states that promoting a project from stage w − 1 to stage w is more desirable when a project in stage l < q is replaced with another project in stage q ≤ w − 1.
Conversely, Property 13 shows that the incentive to promote a project from one stage to the next stage weakly decreases if another project in the same stage gets promoted.
The intuition behind Properties 8 and 9 is as follows: Since delays in project completion times are costly, it is more desirable to have projects that are closer to the product launch stage, as well as to launch a new product immediately if it is feasible. We show in the online appendix that Property 10 implies Properties 11-13, which enable us to uncover the role congestion plays in promotion decisions in each stage: • From the perspective of a project in stage l − 1, a project in stage q ≥ l causes more congestion than in stage w > q: A project in stage l − 1, if promoted along the NPD process, is more likely to catch the project in stage q than in stage w. Hence, if a project in stage w is replaced with a project in stage q, it becomes more likely that projects accumulate in the same stage and create a bottleneck, leading to investments with a lower rate of return for some projects.
Thus it becomes less desirable to promote the project in stage l − 1 (Property 11).
• Conversely, there is a greater benefit in promoting the project in stage w if a project in stage l < q is replaced with a project in stage q < w (Property 12). The bottleneck of the NPD process is more likely to occur in stage w when projects in stage l < w get closer to stage w.
Promoting a project in stage w might help us avoid such an occurrence of the bottleneck.
• Although the incentive to promote a project increases as projects in later or earlier stages get promoted (Properties 11 and 12), the incentive to promote a project decreases if another project in the same stage gets promoted (Property 13): Successively advancing projects from the same stage may increase the risk of creating a bottleneck in a further stage.
Using the structural properties of our optimal reward function in (6), we again establish the optimality of an SDNCP policy: Theorem 2. Under Assumption 2, the optimal stationary project promotion policy in each stage i is a state-dependent noncongestive-promotion policy with state-dependent promote-up-to levels The optimal policy has the following additional properties: i. The optimal promote-up-to level S * i (x −i ) weakly increases as the number of projects in stage j > i decreases.
ii. The optimal promote-up-to level S * i (x −i ) weakly increases as the number of projects in stage j < i increases.
iii. The optimal promote-up-to level S * i (x −i ) weakly increases as projects in stage j = i − 1 move along the process.
iv. It is always optimal to launch a new product if there are projects available for product launch.
v. It is never optimal to interrupt any experiment.
vi. It is never optimal to terminate any project.
Using Property 11, Theorem 2 proves the optimality of a state-dependent non-congestive promotion policy: Each stage of the NPD process is protected against congestion through promote-up-to levels. Points (i) and (ii) state that the promote-up-to level in a given stage weakly increases as a later stage becomes less congested (due to Property 11), or as an earlier stage becomes more congested (due to Property 12). Point (iii) shows that the promote-up-to level in stage i weakly increases as projects in stage j = i − 1 get promoted to the next stage (due to Property 10). Point (iv) shows that it is always optimal to launch a new product if it is feasible (due to Property 9). Point (v) says it is never optimal to interrupt an experiment once it has been initiated. Point (vi) says it is never optimal to terminate any active project.
Theorem 2 significantly broadens current knowledge of NPD processes by revealing the impacts of both upstream and downstream projects on optimal promotion decisions when different stages require different resources. A threshold policy similar to the one described in Theorem 2 has been studied in the Markovian inventory literature. See, for instance, Benjaafar et al. (2011) who show the optimality of a state-dependent base-stock policy for an assembly system with multiple stages.

Heuristic Policies and Their MIP Formulations
In this section, we take as our optimization criterion the average reward rate over an infinite planning horizon. (Our structural results in Sections 3 and 4 continue to hold in the average reward case; we provide a proof of this result in the online appendix.) Section 5.1 formulates a linear program to find a global optimal solution to our general problem of Section 2. Sections 5.2 and 5.3 develop two heuristic project promotion policies for our general problem, formulating mixed integer programs to find the optimal average reward within each heuristic class. Section 5.4 considers a naïve project promotion policy that always promotes projects if feasible, formulating a linear program to find the optimal average reward in this case. Finally, Section 5.5 ranks our heuristic policies in term of their optimal rewards.

Global Optimal Solution
Define π x,u as the limiting probability that the system is in state x and action u ∈ U(x) is chosen.
As a computational requirement, we restrict the state space to be finite: Define x as an upper bound for the total number of projects in the system, and S as the set of all states x obeying (The upper bound should be sufficiently high so that the globally optimal reward does not change with a further increase in the upper bound.) Thus, for any state x ∈ S and any action The globally optimal average reward Z * can be found by solving the following linear program (see Puterman 1994): The first term of the objective function corresponds to the time-average reward for completed projects, the second term corresponds to the time-average experimentation costs, and the last term corresponds to the time-average project holding cost. Constraints (C.1) and (C.2) are the balance equations and normalization constraint that together yield the limiting probability values.
Notice that the above linear program may yield a randomized policy as the global optimal solution, i.e., there may exist a state x ∈ S such that π x,u 1 > 0 and π x,u 2 > 0, where u 1 , u 2 ∈ U(x).
We below construct three heuristic policies with deterministic policy structure on project promotion decisions, as opposed to the global optimal solution. Because our structural results in Sections 3 and 4 are only restricted to project promotion decisions, we do not impose any structure on project termination decisions. Thus our heuristics may still yield a randomized policy (for termination), with potentially many fewer states having randomized decisions.

State-Dependent Non-Congestive Promotion (SDNCP)
Based on our structural results in Sections 3 and 4, we develop an SDNCP policy for our general problem in Section 2: Define ., x n ) as a vector of the numbers of projects • The number of projects in each successor category is less than a state-dependent promote-up- • There is no project with higher expected return in stage i − 1 (if i > 1).
In addition, a new product is launched from a project in category j ∈ W m+1 (that is not to be terminated) if and only if there is no project with higher expected return in stage m. We also enforce the following additional properties on the SDNCP policy: We proceed to the MIP formulation of this heuristic class. First, for b ∈ N 0 , define S j (x −j , b) as the set of state-action pairs (x, u) such that the limiting probability that the system is in state x and action u is chosen should be zero if the promote-up-to level as a binary variable as follows: The optimal solution of the MIP problem should satisfy constraints (C.1)-(C.3) of the LP formulation in Section 5.1. The optimal solution should also select exactly one promote-up-to level in each category, given the numbers of projects in all other categories: Constraint (C.5) ensures that (a) the promote-up-to level in one category is nondecreasing in the number of projects in any other category in an earlier stage: Constraint (C.6) ensures that (b) the promote-up-to level in one category is nondecreasing in the expected reward of a project in any other category in an earlier stage: Constraint (C.7) ensures that (c) the promote-up-to level in one category is nonincreasing in the number of projects in any other category in the same stage or a later stage: Constraint (C.8) ensures that (d) the promote-up-to level in one category is nonincreasing in the expected reward of a project in any other category in the same stage or a later stage: Constraint (C.9) links our binary variables to the appropriate limiting probability variables: is one, then all limiting probability variables corresponding to the state-action pairs in set S j (x −j , b) are forced to equal zero. Otherwise, this constraint becomes redundant. We next enforce the following constraint that always launches a new product if feasible: j∈W m+1 y j = 0 and either j∈W m+1 x j = 1 and z / ∈ W m+1   . (C.10) Lastly, the following constraint ensures that, if a project in a given stage is to be promoted, it should be selected from the most valuable category in that stage: The optimal average reward of this policy Z SDN CP can be found from the following MIP problem: See Bhandari et al. (2008) and Nadar et al. (2015) for similar MIP formulations in different contexts.

Fixed Non-Congestive Promotion (FNCP)
We next consider a simpler non-congestive promotion policy with fixed promote-up-to levels across stages, to evaluate the importance of the more complex SDNCP policy. Specifically, a project in • The total number of projects in stage i is less than a fixed promote-up-to level S i , i.e., j ∈W i+1 x j < S i ; and • There is no project with higher expected return in stage i − 1 (if i > 1).
Product launch decisions have the same structure as those in the SDNCP policy: A new product is launched from a project in category j ∈ W m+1 (that is not to be terminated) if and only if there is no project with higher expected return in stage m. Thus, unlike SDNCP, FNCP takes into account only the total number of projects in stage i in promotion decisions in stage i − 1, ignoring the number of projects in any other stage as well as the breakdown of projects into categories throughout the NPD process. Such simplification over SDNCP conceivably alleviates the computational burden, as verified by our numerical experiments in Section 6.
We proceed to the MIP formulation of this heuristic class. Define S i (b) as the set of state-action pairs (x, u) such that the limiting probability that the system is in state x and action u is chosen should be zero if the promote-up-to level S i equals b. Also, define z S i b as a binary variable as follows: The optimal solution of the MIP problem should satisfy constraints (C.1)-(C.3) of the LP formulation in Section 5.1, and constraints (C.10)-(C.11) of the MIP formulation of the SDNCP policy.
The optimal solution should also select exactly one promote-up-to level in each stage: Constraint (C.13) links our binary variables to the appropriate limiting probability variables: The optimal average reward of this policy Z F N CP can be found from the following MIP problem: 3), (C.10), (C.11), (C.12), (C.13).

Continuous Promotion (CP)
In this heuristic, which maximizes throughput, a new product idea is always tested as long as the total number of active projects is below x, and a project in category j ∈ W i (that is not to be terminated) is advanced to stage i > 1 if and only if there is no project with higher expected return in stage i − 1. The CP policy should satisfy constraints (C.1)-(C.3) of the LP formulation in Section 5.1, and constraint (C.11) of the MIP formulation of the SDNCP policy. In addition, we enforce the following constraint that always tests a new product idea if feasible: Lastly, the following constraint always promotes a project in each stage and always launches a new product if feasible: The optimal average reward of this policy Z CP can be found from the following linear program: h(x)π x,u subject to (C.1), (C.2), (C.3), (C.11), (C.14), (C.15).

Analytical Comparison of Heuristic Policies
The proposition below ranks our heuristic policies in terms of their optimal average rewards: Proof of Proposition 1. The first inequality holds since LP is a relaxation the MIP formulation of SDNCP. The second inequality holds since FNCP is a subclass of SDNCP: SDNCP

Numerical Experiments
We numerically compare the globally optimal policy to our heuristic policies, investigating how system parameters affect the relative rewards of each policy. We confine our analysis to NPD processes that involve (i) two experimental stages, each generating one out of two signals; and (ii) a product launch stage (i.e., m = 2 and K = 2). We construct our numerical instances by varying values of µ 1 , µ 2 , µ 3 , h, c 1 , c 2 , φ (1) , and φ (2) . We assume in all instances that holding cost rates are linear (i.e., h(x) = h j x j ), initial beliefs for a new product idea are evenly distributed among two possible states of the true nature (i.e., p 0,1 = p 0,2 = 0.5), and the probability that an experiment reveals the true nature of a project is independent from the true nature in each experimental stage 2,2 = φ (2) ). Lastly, in all instances, we impose x = 5, r 1 = 40, r 2 = 0, and λ = 100.
For each numerical instance we solve the LP and MIP problems in Section 5 to find the maximum average rewards. We compare the heuristic policies in terms of (i) their percentage differences from optimal reward Z * , calculated as 100 where H ∈ {SDNCP, FNCP, CP}; and (ii) their computation times. We coded the LP and MIP formulations in the Java programming language, incorporating CPLEX 12.5 optimization package, and used a dual processor WinNT server, with Intel Core i7 2.67 GHz processor and 8 GB of RAM. We restricted the computation time of any instance to be no more than one hour. h = 1, c1 = c2 = 4, r1 = 40, r2 = 0, λ = 100, p0,1 = p0,2 = 0.5, φ (1) = φ (2) = 0.75, x = 5.
We first vary experimentation and product launch rates, in Table 1. SDNCP yields the globally optimal reward in each of the 27 compiled instances. FNCP yields the globally optimal reward in 18 instances and CP yields the globally optimal reward in 14 instances. The average distances of FNCP and CP from optimal reward are 0.73% and 3.91%, respectively.
For FNCP and CP, the largest optimality gaps occur when the experimentation rate in stage 1 is 1.5 and the product launch rate is 0.5: If the NPD process slows down in further stages, it is more crucial to protect the system against congestion in a sophisticated manner, which can be achieved by SDNCP but not the other heuristics. Thus the optimality gaps are higher under (monotonically) decreasing rates of experimentation and product launch.
Conversely, if the NPD process speeds up in further stages it is more desirable to aggressively promote projects. The optimality gaps are therefore lower for CP under (monotonically) increasing rates of experimentation and product launch. But it is important to note that CP yields the globally optimal reward even when (i) µ 1 = 0.5, µ 2 = 1.5, and µ 3 = 0.5, and (ii) µ 1 = 1.5, µ 2 = 1.5, and µ 3 = 1. In case (i), new product ideas should always be tested in the initial stage as the later stages  are no slower, and upon completion of the initial stage, projects should always be tested in the second stage as its throughput rate, which is constrained by the initial stage, is no greater than the product launch rate. In case (ii), since projects with very low expected returns are likely to be terminated prior to the product launch stage, the effective throughput rates in stages 1 and 2 are sufficiently small so that projects should always be tested in stages 1 and 2 in order to keep up with the speed of the product launch stage.
We next vary holding and experimentation costs, in Table 2. Again, SDNCP yields the globally optimal reward in each of the 27 compiled instances. FNCP yields the globally optimal reward in 4 instances and CP yields the globally optimal reward in 3 instances. The average distances of FNCP and CP from optimal reward are 2.27% and 8.97%, respectively.
products from projects with sufficiently high expected returns. Those categories from which many projects are to be terminated have little impact on congestion whereas the others contribute significantly to congestion. The system is thus better protected against congestion by taking into consideration the breakdown of projects into categories. Also, under high holding and/or experimentations costs, notice that the benefit of a new product launch relies heavily on cost savings from significantly reduced time-to-market (or congestion), and thus protecting the system against congestion is crucial. As a result, SDNCP performs substantially better than both FNCP and CP when holding and/or experimentation costs are high.
We also observe that the performances of FNCP and CP tend to deteriorate more rapidly with an increment in experimentation cost of stage 1 than in experimentation cost of stage 2: If the experimentation cost is higher in stage 1 an investment on a new product idea has a much lower rate of return than on a project that shows promise upon completion of stage 1. Thus further caution must be taken when promoting new product ideas, by taking into account the projects not only in stage 1 but also in stage 2. Since this is not possible with FNCP and CP, their optimality gaps are larger in this case.
Last, we vary values of φ (1) and φ (2) , in Table 3. SDNCP yields the globally optimal reward in the 20 out of 25 compiled instances. (We examined the globally optimal policy in the instance with the largest optimality gap for SDNCP, i.e., when φ (1) = 0.55 and φ (2) = 0.95: Properties (c) and (d) of SDNCP are violated by optimal actions in several states.) FNCP yields the globally optimal reward in only one instance. The average distances of SDNCP, FNCP, and CP from optimal reward are 0.05%, 1.92%, and 7.12%, respectively.
For FNCP and CP, the largest optimality gaps occur when experimental results are less accurate, i.e., when φ (1) ≤ 0.65 and φ (2) ≤ 0.75: The NPD system has two important goals: (i) choosing the right projects for promotion and termination; and (ii) reducing time-to-market (or congestion).
If experimental results are more accurate, the cost savings from (i) outweigh those from (ii).
Otherwise, the performance of the system is mainly driven by the cost savings from (ii), and congestion should be reduced in a more sophisticated manner, leading to larger optimality gaps for FNCP and CP. Also note that CP performs significantly worse (by more than 25%) when φ (1) = 0.55 and φ (2) = 0.65 than when φ (1) = 0.65 and φ (2) = 0.55: Since projects with lower expected rewards in stage 1 are less likely to be terminated when φ (1) = 0.55, the incentive to test a new product idea is lower, and a much larger optimality gap results.
Our overall conclusion is that SDNCP substantially outperforms FNCP and CP (i) when the NPD process slows down in downstream stages, (ii) when holding and/or experimentation costs are higher, or (iii) when experiments are less reliable. In addition, CP performs significantly worse if initial testing is of little accuracy, but very expensive.
We list computation times for the heuristics in the last three columns of Tables 1-3: Computation times of CP are significantly shorter than those of SDNCP and FNCP by up to three orders and one order or magnitude, respectively. (Computation times for the global optimal solution are only slightly greater than those of CP.) We also considered instances with larger values of m, K, and x to draw more general conclusions about our heuristics: We could not solve instances for SDNCP and FNCP when m = 3 or K = 3 since the MIP solver runs out of memory. However we could solve instances for SDNCP and FNCP when x = 6. Because the SDNCP computation times exceed one hour in many instances, and the basic insights about FNCP and CP when x = 5 continue to hold, we relegated our numerical results when x = 6 to the online appendix.

Concluding Remarks
We have studied the NPD portfolio selection problem under Markovian assumptions. We show the optimality of SDNCP (a) when there is a single experimental stage and projects are not terminated, or (b) when there are multiple non-experimental stages. These findings are the first to reveal the impact of congestion on optimal project promotion decisions under imperfect information and scarce resources across stages. We also prove that SDNCP outperforms both FNCP and CP, which are simpler versions of SDNCP whose thresholds are constant across states, with respect to objective value in the general problem. In addition, SDNCP yields the globally optimal reward in the vast majority of the instances in Section 6, and its average distance from the optimal reward is virtually zero. However, the average distances of FNCP and CP are 1.62% and 6.74%, respectively. The strong numerical performance of SDNCP demonstrates that project promotion decisions should be based on a broader monitoring of projects across all categories. Our numerical results also indicate that the easy-to-implement FNCP policy may be a very good choice (i) when upstream stages are slower than downstream stages, (ii) when projects have higher expected margins, or (iii) when experiments are more reliable.
An important avenue for future research is to explore the optimal policy in the general problem.
Further optimality results may require alternate metrics for congestion, extending our structural results: Section 3 revealed that a stage becomes more congested with an increase in the expected reward of any project. Section 4 revealed that it becomes more desirable to promote a project in a particular stage when projects in earlier or later stages get promoted. Therefore one might intuitively expect the incentive to promote projects to be more subtle (i) if a project in any later stage is promoted to the next stage, returning a signal that increases its expected reward above a certain threshold (thus both reducing and increasing congestion), or (ii) if a project in any earlier stage is promoted to the next stage, returning a signal that decreases its expected reward below a certain threshold (again both increasing and reducing congestion).
Future research could also extend our model to phase-type or even general experimentation times. (For example, it appears straightforward to prove that SDNCP remains optimal under Erlang experimentation times when there are multiple non-experimental stages.) Another direction is to study the portfolio selection problem when different stages share a single resource. The problem could then be viewed as an MAB problem, with the states of the projects described by the categories, and thus an index rule may be optimal. Also, extending our Bayesian framework to different conjugate priors is an interesting problem to pursue. Lastly, it would be more realistic to use experimental results for one project to update beliefs about other (related) projects.

EC.1. Existence of the Optimality Equation
Suppose that Assumptions 6.10.1 and 6.10.2 in Puterman (1994) hold. Then, by Theorem 6.10.4 in Puterman (1994), we can establish the existence of a solution to the optimality equation and the convergence of value iteration. We below show that Assumptions 6.10.1 and 6.10.2 hold for our NPD model in Section 2. Recall that the holding cost per unit time (h(x)) is convex and strictly increasing in the total number of projects. Let ρ denote the maximum expected reward that can be obtained from a product launch, and r(x, u) denote the reward per stage when the system is in state x and action u is chosen.
(1) We can easily verify that Assumption 6.10.1 holds: (2) Next we want to prove that Assumption 6.10.2 holds. Suppose that a new project is launched in state x at the beginning of period k. Thus the discounted holding cost in period k increases by h(x + e 1 ) − h(x) (recall that without loss of generality we assume α + ν = 1). The return from this project can be earned no earlier than the beginning of period k + 1. Thus the total discounted expected return (from all active projects) cannot increase by more than νρ, i.e., the discounted value of the maximum expected reward obtained at the beginning of period k + 1.
For all the projects included in state x, moving from x to x + e 1 does not change the action space, or any of the transition probabilities for actions then or in the future. We conclude that if h(x + e 1 ) − h(x) > νρ, it is not optimal to launch a new project in state x.
(a) Consider the states x for which h(x + e 1 ) − h(x) ≤ νρ. Notice: (b) Consider the states x for which h(x + e 1 ) − h(x) > νρ. Without loss of optimality we assume that the system never moves from state x to x + e 1 under any action u, i.e., P x,x+e 1 (u) = 0, ∀u. Notice: Hence, following Proposition 6.10.5 in Puterman (1994), we conclude that Assumption 6.10.2 holds.

EC.2. Proofs of the Results in Section 3
Proof of Lemma 1. Define V as the set of real-valued functions on N n 0 that satisfy Properties 1-7. Also, define the operator T on the set of real-valued functions v as follows: . We below show that T 1 : V → V , T 2 : V → V , and −h ∈ V . We will then prove that T : V → V .
T 1 : V → V . Assume that v satisfies Properties 1-7. We below show T 1 v satisfies Properties 1-7. Property 1. We will prove T 1 v satisfies Property 1, i.e., T 1 v(x + e w ) ≤ T 1 v(x + e q ), ∀(q, w) s.t. 0 < q < w. We consider the following two scenarios depending on the optimal action at T 1 v(x + e w ) (if this inequality holds under a suboptimal action of T 1 v(x + e q ), it also holds under the optimal action of this operator, and thus we do not enforce the optimal action at this operator): (1) Suppose that T 1 v(x + e w ) = v(x + e w ). As we assume v satisfies Property 1, the following inequalities hold: (2) Suppose that T 1 v(x+e w ) = j v (x + e w + e j ) f 0→j −c. As we assume v satisfies Property 1, the following inequalities hold: Therefore T 1 v satisfies Property 1.
Property 2. We will prove T 1 v satisfies Property 2, i.e., T 1 v(x + e w ) ≤ T 1 v(x) + ρ w , ∀w > 0. We consider the following two scenarios depending on the optimal action at T 1 v(x + e w ): (1) Suppose that T 1 v(x + e w ) = v(x + e w ). As we assume v satisfies Property 2, the following inequalities hold: (2) Suppose that T 1 v(x + e w ) = j v (x + e w + e j ) f 0→j − c. As we assume v satisfies Property 2, the following inequalities hold: Therefore T 1 v satisfies Property 2.
Property 3. We will prove T 1 v satisfies Property 3, i.e., T 1 v(x + e q ) + ρ w ≤ T 1 v(x + e w ) + ρ q , ∀(q, w) s.t. 0 < q < w. We consider the following two scenarios depending on the optimal action at (1) Suppose that T 1 v(x + e q ) = v(x + e q ). As we assume v satisfies Property 3, the following inequalities hold: (2) Suppose that T 1 v(x + e q ) = j v (x + e q + e j ) f 0→j − c. As we assume v satisfies Property 3, the following inequalities hold: Therefore T 1 v satisfies Property 3.
Property 4. We will prove T 1 v satisfies Property 4, i.e., T 1 v(x + e q + e l ) − T 1 v(x + e q + e d ) ≤ We consider the following scenarios depending on the optimal actions at T 1 v(x + e q + e l ) and T 1 v(x + e d ) (if this inequality holds under suboptimal actions of T 1 v(x + e q + e d ) and/or T 1 v(x + e l ), it also holds under optimal actions of these operators, and thus we do not enforce the optimal actions at these operators): (1) Suppose that T 1 v(x + e q + e l ) = v(x + e q + e l ) and T 1 v(x + e d ) = v(x + e d ). As we assume v satisfies Property 4, (2) Suppose that T 1 v(x + e q + e l ) = v(x + e q + e l ) and As we assume v satisfies Properties 4 and 7, the following inequalities hold: (3) Suppose that T 1 v(x + e q + e l ) = j v(x + e q + e l + e j )f 0→j − c and T 1 v(x + e d ) = v(x + e d ). As we assume v satisfies Properties 4 and 6, the following inequalities hold: As we assume v satisfies Property 4, the following inequalities hold: Therefore T 1 v satisfies Property 4.
Property 5. We will prove T 1 v satisfies Property 5, i.e., T 1 v(x + e q + e l ) − T 1 v(x + e q + e d ) ≤ We consider the following scenarios depending on the optimal actions at T 1 v(x + e q + e l ) and T 1 v(x + e w + e d ): (1) Suppose that T 1 v(x + e q + e l ) = v(x + e q + e l ) and T 1 v(x + e w + e d ) = v(x + e w + e d ). As we assume v satisfies Property 5, (2) Suppose that T 1 v(x + e q + e l ) = v(x + e q + e l ) and T 1 v(x + e w + e d ) = j v(x + e w + e d + e j )f 0→j − c. As we assume v satisfies Properties 5 and 7, the following inequalities hold: As we assume v satisfies Properties 5 and 7, the following inequalities hold: As we assume v satisfies Property 5, the following inequalities hold: Therefore T 1 v satisfies Property 5. Property 6. We will prove T 1 v satisfies Property 6, i.e., We consider the following scenarios depending on the optimal actions at T 1 v(x + e q + e l ), T 1 v(x + e q + e j ) for j > l, and T 1 v(x) (if this inequality holds under suboptimal actions of T 1 v(x + e q ), T 1 v(x + e l ), and/or T 1 v(x + e j ) for j > l, it also holds under optimal actions of these operators, and thus we do not enforce the optimal actions at these operators): (1) Suppose that T 1 v(x) = v(x). As we assume v satisfies Property 6, T 1 v(x + e q + e l ) = v(x + e q + e l ) and T 1 v(x + e q + e j ) = v(x + e q + e j ) for j > l. As we assume v satisfies Property 6, the following inequalities hold: As we assume v satisfies Properties 4 and 6, the following inequalities hold: (3) Suppose that T 1 v(x + e q + e l ) = j v(x + e q + e l + e j )f 0→j − c. As we assume v satisfies Property 7, T 1 v(x + e q + e j ) = j v(x + e q + e j + e j )f 0→j − c for j > l. As we assume v satisfies Property 6, T 1 v(x) = j v(x + e j )f 0→j − c. As we assume v satisfies Property 6, the following inequalities hold: Therefore T 1 v satisfies Property 6.
Property 7. We will prove T 1 v satisfies Property 7, i.e., , ∀(l, q, w) s.t. 0 < l ≤ q < w. We consider the following scenarios depending on the optimal actions at T 1 v(x + e q + e l ), T 1 v(x + e q + e j ) for j > l, and (1) Suppose that T 1 v(x + e w ) = v(x + e w ). As we assume v satisfies Properties 6 and 7, T 1 v(x + e q + e l ) = v(x + e q + e l ) and T 1 v(x + e q + e j ) = v(x + e q + e j ) for j > l. As we assume v satisfies Property 7, the following inequalities hold: As we assume v satisfies Properties 5 and 7, the following inequalities hold: (3) Suppose that T 1 v(x + e q + e l ) = j v(x + e q + e l + e j )f 0→j − c. As we assume v satisfies Property 7, T 1 v(x + e q + e j ) = j v(x + e q + e j + e j )f 0→j − c for j > l. As we assume v satisfies Properties 6 and 7, T 1 v(x + e w ) = j v(x + e w + e j )f 0→j − c. As we assume v satisfies Property 7, the following inequalities hold: Therefore T 1 v satisfies Property 7. Hence T 1 v satisfies Properties 1-7, i.e., T 1 : V → V .
0 < q < w. As we assume v satisfies Properties 2 and 3, it is always optimal to launch a new product by choosing a project with highest expected reward: T 2 v(x + e w ) = v (x + e w − e b ) + ρ b where b is the smallest j such that x j + I j=w ≥ 1 (I j=w = 1 if j = w, and I j=w = 0 otherwise). Suppose that b = w.
Then it is easy to verify that T 2 v(x + e w ) = v(x) + ρ w ≤ v(x) + ρ q ≤ T 2 v(x + e q ). Now suppose that b = w. As we assume v satisfies Property 1, . Therefore T 2 v satisfies Property 1.
Property 2. We will prove T 2 v satisfies Property 2, i.e., T 2 v(x + e w ) ≤ T 2 v(x) + ρ w , ∀w > 0. As we assume v satisfies Properties 2 and 3, Property 3. We will prove T 2 v satisfies Property 3, i.e., T 2 v(x + e q ) + ρ w ≤ T 2 v(x + e w ) + ρ q , ∀(q, w) s.t. 0 < q < w. As we assume v satisfies Properties 2 and 3, where b is the smallest j such that x j + I j=q ≥ 1. Suppose that b = q. Then it is easy to verify Property 4. We will prove T 2 v satisfies Property 4, i.e., Recall that as we assume v satisfies Properties 2 and 3, it is always optimal to launch a new product by choosing a project with highest expected reward. We consider the following scenarios: (1) Suppose that x j = 0, ∀j ≤ l. Then (2) Suppose that x j = 0, ∀j ≤ d, and x j ≥ 1, ∃j ∈ {d + 1, d + 2, .., l}. Define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 4, (3) Suppose that x j ≥ 1, ∃j ≤ d. Again define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 4, Therefore T 2 v satisfies Property 4.
Property 5. We will prove T 2 v satisfies Property 5, i.e., T 2 v(x + e q + e l ) − T 2 v(x + e q + e d ) ≤ Recall that as we assume v satisfies Properties 2 and 3, it is always optimal to launch a new product by choosing a project with highest expected reward. We consider the following scenarios: (1) Suppose that x j = 0, ∀j ≤ l. Then (2) Suppose that x j = 0, ∀j ≤ d, and x j ≥ 1, ∃j ∈ {d + 1, d + 2, .., l}. Define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 5, (3) Suppose that x j ≥ 1, ∃j ≤ d. Again define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 5, Therefore T 2 v satisfies Property 5.
Property 6. We will prove T 2 v satisfies Property 6, i.e., Recall that as we assume v satisfies Properties 2 and 3, it is always optimal to launch a new product by choosing a project with highest expected reward. Taking q = n, we consider the following scenarios: (1) Suppose that x j = 0, ∀j ≤ q = n. As we assume v satisfies Property 2, the following inequality holds: (2) Suppose that x j = 0, ∀j ≤ l, and x j ≥ 1, ∃j ∈ {l + 1, l + 2, .., q}. Define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 6, the following inequality holds: (3) Suppose that x j ≥ 1, ∃j ≤ l. Again define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 6, the following inequality holds: Therefore T 2 v satisfies Property 6 if q = n. We below prove that T 2 v also satisfies Property 7. Thus the following inequalities hold:

Summation
of the above inequalities implies 1≤j≤l T 2 v(x + e q + e l )f 0→j + n≥j>l T 2 v(x + e q + e j )f 0→j − T 2 v(x + e q ) ≤ 1≤j≤l T 2 v(x + e l )f 0→j + n≥j>l T 2 v(x + e j )f 0→j − T 2 v(x) for l ≤ q ≤ n. Therefore T 2 v satisfies Property 6. Property 7. We will prove T 2 v satisfies Property 7, i.e., 1≤j≤l T 2 v(x + e q + e l )f 0→j + Recall that as we assume v satisfies Properties 2 and 3, it is always optimal to launch a new product by choosing a project with highest expected reward. Taking q = w − 1, we consider the following scenarios: (1) Suppose that x j = 0, ∀j ≤ q = w − 1. As we assume v satisfies Property 3, the following inequality holds: (2) Suppose that x j = 0, ∀j ≤ l, and x j ≥ 1, ∃j ∈ {l + 1, l + 2, .., q}. Define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 7, the following inequality holds:   3) Suppose that x j ≥ 1, ∃j ≤ l. Again define b as the smallest j such that x j ≥ 1. As we assume v satisfies Property 7, the following inequality holds: Therefore T 2 v satisfies Property 7 if q = w − 1. This implies that T 2 v also satisfies Property 7 for Therefore T 2 v satisfies Property 7. Hence T 2 v satisfies Properties 1-7, i.e., T 2 : V → V .
Property 1. As h is constant as long as the total number of projects remains the same, −h(x + e w ) = −h(x + e q ).
Property 2. As h is increasing in the total number of projects, −h(x + e w ) ≤ −h(x) + ρ w .
T : V → V . Assume v satisfies Properties 1-7, i.e., v ∈ V . We proved that T 1 v, T 2 v, and −h satisfy Properties 1-7. Notice that T v satisfies Properties 1, 4, 5, 6, and 7, as these properties are preserved by linear transformations. We below show that T v also satisfies Properties 2 and 3.
• We will prove T v satisfies Property 2, i.e., is increasing in the total number of projects, (ii) T 1 v and T 2 v satisfy Property 2, and (iii) Therefore T v satisfies Properties 1-7, i.e., T v ∈ V . Thus T : V → V . Following Theorem 6.10.4 in Puterman (1994), we verify that lim k→∞ (T k v 0 )(x) = v * (x) where v 0 is the zero function, v * is the optimal reward function, and T k refers to k compositions of operator T . Since v 0 ∈ V and T : V → V , we have T k v 0 ∈ V , and thus v * ∈ V .
Proof of Theorem 1. By Lemma 1, we know v * ∈ V . Define, for v * ∈ V , Since v * satisfies Property 6, 1≤k≤n v * (z + e k )f 0→k − v * (z) is nonincreasing in z j . As z j increases, since the holding cost rate h is strictly increasing, this difference will eventually cross c. Hence the optimal stationary policy is a state-dependent noncongestive promotion policy with statedependent promote-up-to levels S * j (x −j ). Next we will prove properties (i)-(v): i. Pick arbitrary j and k such that j = k. We will show that the optimal promote-up-to level for . By definition, it is optimal to initiate an experiment at z if z j < S * j (z −j ), and it is not optimal to do so at x if x j = S * j (x −j ) < S * j (z −j ). But we have a contradiction when If it is optimal to initiate an experiment at z, it should also be optimal to do so at state x (due to Property 6). Thus we must have S ii. Pick arbitrary j, k, and k such that j / ∈ {k, k } and k < k. We will show that the optimal promote-up-to level for category j obeys S . By definition, it is optimal to initiate an experiment at z if z j < S * j (z −j ), and it is not optimal to do so at . But we have a contradiction when z j = x j = S * j (x −j ): If it is optimal to initiate an experiment at z, it should also be optimal to do so at x (due to Property 7). Thus we must iii. Suppose that ∃k s.t. x k > 0. Property 2 implies it is always optimal to launch a new product: Property 3 implies that it is always optimal to choose a project with highest expected reward: v. Lastly, we will prove it is never optimal to interrupt any experiment. Assume that an experiment is optimally initiated at a given state x. We then consider the following two cases: • Suppose that ∃j s.t. x j > 0. Point (iii) implies that it is optimal to launch a new product at state x. Suppose that a new product is placed on the market before the experiment is complete. Thus the system moves to a state z where z j * = x j * − 1 and z j = x j , ∀j = j * (j * is the category with highest expected reward among available categories). Property 6 implies that it is optimal to initiate an experiment at state z: The experiment initiated at x can be resumed at z. Now suppose that a new product is placed on the market after the experiment is complete. But then the experiment is not interrupted.
• Suppose that x j = 0, ∀j. The system can move to a new state only after the experiment is complete. Therefore the experiment is not interrupted.

EC.3. Proofs of the Results in Section 4
We will use the following auxiliary lemma to prove Lemma 2: Lemma EC.1. A real-valued function on N n 0 satisfying Property 10 also satisfies Properties 11-13.
Thus g satisfies Property 13.
Proof of Lemma 2. Define V as the set of real-valued functions on N n 0 that satisfy Properties 8-13. Also, define the operator T on the set of real-valued functions v as follows: We will then prove that T : V → V .
Property 8. We will prove T 0 v satisfies Property 8, i.e., T 0 v(x + e q ) ≤ T 0 v(x + e w ), ∀(q, w) s.t. 0 < q < w ≤ m. We consider the following two scenarios depending on the optimal action at T 0 v(x + e q ) (if this inequality holds under a suboptimal action of T 0 v(x + e w ), it also holds under the optimal action of this operator, and thus we do not enforce the optimal action at this operator): (1) Suppose that T 0 v(x + e q ) = v(x + e q ). As we assume v satisfies Property 8, (2) Suppose that T 0 v(x + e q ) = v(x + e q − e l ) where l ≥ 1. This scenario is possible only when x + e q ≥ e l . Also, suppose that l = q. Thus we should have x l > 0. As we assume v satisfies Then it is easy to verify that Therefore T 0 v satisfies Property 8.
Property 9. We will prove T 0 v satisfies Property 9, i.e., T 0 v(x + e m ) ≤ T 0 v(x) + ρ. We consider the following two scenarios depending on the optimal action at T 0 v(x + e m ): (1) Suppose that T 0 v(x + e m ) = v(x + e m ). As we assume v satisfies Property 9, (2) Suppose that T 0 v(x + e m ) = v(x + e m − e l ) where l ≥ 1. This scenario is possible only when x + e m ≥ e l . Also, suppose that l = m. Thus we should have x l > 0. As we assume v satisfies Then it is easy to verify that Therefore T 0 v satisfies Property 9.
Property 10. We will prove T 0 v satisfies Property 10, i.e., T 0 v(x + e q + e w−1 ) − T 0 v(x + e q−1 + e w−1 ) ≤ T 0 v(x + e q + e w ) − T 0 v(x + e q−1 + e w ), ∀(q, w) s.t. q = w and 0 < q, w ≤ m + 1. We consider the following four scenarios depending on the optimal actions at T 0 v(x + e q + e w−1 ) and T 0 v(x + e q−1 + e w ) (if this inequality holds under suboptimal actions of T 0 v(x + e q−1 + e w−1 ) and/or T 0 v(x + e q + e w ), it also holds under optimal actions of these operators, and thus we do not enforce the optimal actions at these operators): (1) Suppose that T 0 v(x+e q +e w−1 ) = v(x+e q +e w−1 ) and T 0 v(x+e q−1 +e w ) = v(x+e q−1 +e w ). As we assume v satisfies Property 10, (2) Suppose that T 0 v(x + e q + e w−1 ) = v(x + e q + e w−1 ) and T A v(x + e q−1 + e w ) = v(x + e q−1 + e w − e l ) where l ≥ 1. This scenario is possible only when x + e q−1 + e w ≥ e l . Also, suppose that l < w. Thus we should have x + e q−1 ≥ e l . As we assume v satisfies Properties 10 and 12, the following inequalities hold: Now suppose that l ≥ w. As we assume v satisfies Property 8, if a project is to be terminated, it is optimal to choose this from the earliest possible stage. Thus we should have l = w. As we assume v satisfies Property 8, (3) Suppose that T 0 v(x + e q + e w−1 ) = v(x + e q + e w−1 − e l ) where l ≥ 1, and T 0 v(x + e q−1 + e w ) = v(x + e q−1 + e w ). This scenario is possible only when x + e q + e w−1 ≥ e l . Also, suppose that l < q. Thus we should have x + e w−1 ≥ e l . As we assume v satisfies Properties 10 and 12, the following inequalities hold: Now suppose that l ≥ q. As we assume v satisfies Property 8, we should have l = q. Again as we assume v satisfies Property 8, T 0 v(x + e q + e w−1 ) − T 0 v(x + e q−1 + e w−1 ) ≤ v(x + e w−1 ) − v(x + e w−1 ) = 0 ≤ v(x + e q + e w ) − v(x + e q−1 + e w ) ≤ T 0 v(x + e q + e w ) − T 0 v(x + e q−1 + e w ).
(4) Suppose that T 0 v(x + e q + e w−1 ) = v(x + e q + e w−1 − e l ) where l ≥ 1, and T 0 v(x + e q−1 + e w ) = v(x + e q−1 + e w − e d ) where d ≥ 1. This scenario is possible only when x + e q + e w−1 ≥ e l and x + e q−1 + e w ≥ e d . Without loss of generality we assume q < w. First, suppose that l = q.
As we assume v satisfies Property 8, we should have l < q and x l > 0. Also we should have l = d. As we assume v satisfies Property 10, . Second, suppose that l = q. As we assume v satisfies Property 8, we should have x i = 0, ∀i < q. Also we should have d = q − 1. Then it is easy to Therefore T 0 v satisfies Property 10. By Lemma EC.1, T 0 v also satisfies Properties 11-13. Hence we showed that T 0 v satisfies Properties 8-13, i.e., T 0 : V → V .
T i : V → V . Assume that v satisfies Properties 8-13. We below show T i v satisfies Properties 8-13, ∀i.
Property 8. We will prove T i v satisfies Property 8, i.e., s.t. 0 < q < w ≤ m. We consider the following two scenarios depending on the optimal action at T i v(x + e q ) (if this inequality holds under a suboptimal action of T i v(x + e w ), it also holds under the optimal action of this operator, and thus we do not enforce the optimal action at this operator): (1) Suppose that T i v(x + e q ) = v(x + e q ). As we assume v satisfies Property 8, (2) Suppose that T i v(x + e q ) = v(x + e q − e i−1 + e i ) − c i . This scenario is possible only when x + e q ≥ e i−1 . Also, suppose that i = q + 1. Since x + e q ≥ e i−1 , we should have x ≥ e i−1 . As we assume v satisfies Property 8, . Now suppose that i = q + 1. As we assume v satisfies Property 8 and q < w, Therefore T i v satisfies Property 8.
Property 9. We will prove T i v satisfies Property 9, i.e., T i v(x + e m ) ≤ T i v(x) + ρ. We consider the following two scenarios depending on the optimal action at T i v(x + e m ): (1) Suppose that T i v(x + e m ) = v(x + e m ). As we assume v satisfies Property 9, (2) Suppose that T i v(x + e m ) = v(x + e m − e i−1 + e i ) − c i . This scenario is possible only when x + e m ≥ e i−1 . Since i ≤ m, we should have x ≥ e i−1 . As we assume v satisfies Property 9, Therefore T i v satisfies Property 9.
Property 10. We will prove T i v satisfies Property 10, i.e., T i v(x + e q + e w−1 ) − T i v(x + e q−1 + e w−1 ) ≤ T i v(x+e q +e w )−T i v(x+e q−1 +e w ), ∀(q, w) s.t. q = w and 0 < q, w ≤ m+1. We consider the following four scenarios depending on the optimal actions at T i v(x + e q + e w−1 ) and T i v(x + e q−1 + e w ) (if this inequality holds under suboptimal actions of T i v(x+e q−1 +e w−1 ) and/or T i v(x+e q +e w ), it also holds under optimal actions of these operators, and thus we do not enforce the optimal actions at these operators): (1) Suppose that T i v(x+e q +e w−1 ) = v(x+e q +e w−1 ) and T i v(x+e q−1 +e w ) = v(x+e q−1 +e w ). As we assume v satisfies Property 10, (2) Suppose that T i v(x + e q + e w−1 ) = v(x + e q + e w−1 ) and T i v(x + e q−1 + e w ) = v(x + e q−1 + e w − e i−1 + e i ) − c i . This scenario is possible only when x + e q−1 + e w ≥ e i−1 . Also, suppose that i = q. Since x + e q−1 + e w ≥ e i−1 , we should have x + e w ≥ e i−1 . As we assume v satisfies Property 10, the following inequalities hold: Now suppose that i = q. Then it is easy to verify that (3) Suppose that T i v(x + e q + e w−1 ) = v(x + e q + e w−1 − e i−1 + e i ) − c i and T i v(x + e q−1 + e w ) = v(x + e q−1 + e w ). This scenario is possible only when x + e q + e w−1 ≥ e i−1 . Also, suppose that i = w. Since x + e q + e w−1 ≥ e i−1 , we should have x + e q ≥ e i−1 . As we assume v satisfies Property 10, the following inequalities hold: Now suppose that i = w. Then it is easy to verify that (4) Suppose that T i v(x + e q + e w−1 ) = v(x + e q + e w−1 − e i−1 + e i ) − c i and T i v(x + e q−1 + e w ) = v(x + e q−1 + e w − e i−1 + e i ) − c i . This scenario is possible only when x + e q + e w−1 ≥ e i−1 and x + e q−1 + e w ≥ e i−1 . Thus, since q = w, we should have x ≥ e i−1 . As we assume v satisfies Property 10, the following inequalities hold: Therefore T i v satisfies Property 10. By Lemma EC.1, T i v also satisfies Properties 11-13. Hence we showed that T i v satisfies Properties 8-13, i.e., T i : V → V .
T m+1 : V → V . Assume that v satisfies Properties 8-13. We below show T m+1 v satisfies Properties 8-13. Property 8. We will prove T m+1 v satisfies Property 8, i.e., T m+1 v(x + e q ) ≤ T m+1 v(x + e w ), ∀(q, w) s.t. 0 < q < w ≤ m. We consider the following two scenarios depending on the optimal action at T m+1 v(x + e q ): (1) Suppose that T m+1 v(x + e q ) = v(x + e q ). As we assume v satisfies Property 8, (2) Suppose that T m+1 v(x + e q ) = v(x + e q − e m ) + ρ. This scenario is possible only when x + e q ≥ e m . Since q < w ≤ m, note that x m > 0. As we assume v satisfies Property 8, Therefore T m+1 v satisfies Property 8.
Property 9. We will prove T m+1 v satisfies Property 9, i.e., T m+1 v(x + e m ) ≤ T m+1 v(x) + ρ. We consider the following two scenarios depending on the optimal action at T m+1 v(x + e m ): (1) Suppose that T m+1 v(x+e m ) = v(x+e m ). As we assume v satisfies Property 9, Therefore T m+1 v satisfies Property 9.
We consider the following four scenarios depending on the optimal actions at T m+1 v(x + e q + e w−1 ) and T m+1 v(x + e q−1 + e w ): (1) Suppose that T m+1 v(x + e q + e w−1 ) = v(x + e q + e w−1 ) and T m+1 v(x + e q−1 + e w ) = v(x + e q−1 + e w ). As we assume v satisfies Property 10, (2) Suppose that T m+1 v(x + e q + e w−1 ) = v(x + e q + e w−1 ) and T m+1 v(x + e q−1 + e w ) = v(x + e q−1 + e w − e m ) + ρ. This scenario is possible only when x + e q−1 + e w ≥ e m . As we assume v satisfies Property 9, it is always optimal to launch a new product if it is feasible. Thus, since T m+1 v(x + e q + e w−1 ) = v(x + e q + e w−1 ), we should have x m = 0, q = m, and w = m + 1.
Since x + e q−1 + e w ≥ e m , we should also have q = m + 1 and/or w = m. First, suppose that q = m + 1 and w = m. Then the following inequalities hold (recall that e m+1 is a zero vector of dimension m): Second, suppose that q = m + 1 and w = m. Thus q < m. As we assume v satisfies Property . Lastly, suppose that q = m + 1 and w = m. Then the following inequalities hold: (3) Suppose that T m+1 v(x + e q + e w−1 ) = v(x + e q + e w−1 − e m ) + ρ and T m+1 v(x + e q−1 + e w ) = v(x + e q−1 + e w ). This scenario is possible only when x + e q + e w−1 ≥ e m . As we assume v satisfies Property 9, it is always optimal to launch a new product if it is feasible. Thus, since T m+1 v(x + e q−1 + e w ) = v(x + e q−1 + e w ), we should have x m = 0, q = m + 1, and w = m. Since x + e q + e w−1 ≥ e m , we should also have q = m and/or w = m + 1. First, suppose that q = m and w = m + 1. Thus w < m. As we assume v satisfies Property 11, T m+1 v(x + e q + e w−1 ) − T m+1 v(x + e q−1 + e w−1 ) ≤ v(x + e w−1 ) + ρ − v(x + e q−1 + e w−1 ) ≤ v(x + e w ) + ρ − v(x + e q−1 + e w ) ≤ T m+1 v(x + e q + e w ) − T m+1 v(x + e q−1 + e w ). Second, suppose that q = m and w = m + 1.
Property 8. As h is constant as long as the total number of projects remains the same, −h(x + e q ) = −h(x + e w ).
Property 9. As h is increasing in the total number of projects, −h(x + e m ) ≤ −h(x) + ρ.
• Pick arbitrary j ≥ 1 such that j + 1 = i and j + 1 ≤ m. Suppose that the system moves to a state z such that z j + 1 = x j , z j+1 = x j+1 + 1, and z j = x j , ∀j / ∈ {j, j + 1}. Point (iii) implies that it is optimal to promote a project to stage i at z: Again, the experiment in stage i, which has been initiated at x, can be resumed at z.
• Suppose that the system moves to a state z such that z m + 1 = x m and z j = x j , ∀j < m.
Also, suppose that i < m. Point (i) implies that it is optimal to promote a project to stage i at state z: Once again, the experiment in stage i, which has been initiated at x, can be resumed at z. Next suppose that i = m: It is optimal to promote a project to stage m at state z, since z m < x m < S * m (x −m ) = S * m (z −m ). Once again, the experiment in stage i, which has been initiated at x, can be resumed at z. Therefore the experiment in stage i is never interrupted.
vi. As v * satisfies Property 8, if a project is to be terminated, it is optimal to choose this from the earliest possible stage. Pick arbitrary state x. Let i * denote the earliest stage with at least one available project. Thus x i = 0, ∀i < i * . Suppose that it is not optimal to terminate a project from category i * at state x, i.e., v * (x) ≥ v * (x − e i * ). Then we consider the following scenarios: • Suppose that the system moves to a state z such that z 1 = x 1 + 1 and z j = x j , ∀j > 1. This implies that v * (x + e 1 ) − c 1 ≥ v * (x). Since c 1 ≥ 0, we should have v * (x + e 1 ) ≥ v * (x). Thus it is not optimal to terminate a project in stage 1. As v * satisfies Property 8, we should also have v * (x + e 1 ) − v * (x + e 1 − e i * ) ≥ v * (x + e 1 ) − v * (x) ≥ 0. Thus it is not optimal to terminate any project in stage i * .
• Suppose that the system moves to a state z such that z i * + 1 = x i * , z i * +1 = x i * +1 + 1, and z j = x j , ∀j / ∈ {i * , i * + 1}. This implies that v * (x) ≤ v * (z) − c i * +1 . Also, suppose that x i * ≥ 2. Since v * (x − e i * ) ≤ v * (x) and v * satisfies Property 12 (take l = 0 and q = w − 1), Thus it is not optimal to terminate any project in stage i * . Now suppose that x i * = 1. Notice that z i = 0, ∀i ≤ i * . If a project is to be terminated at z, it is optimal to select this from stage i * + 1. But it is not optimal to terminate such a project: Since v * (x − e i * ) ≤ v * (x) and v * (x) ≤ v * (z) − c i * +1 , we should have v * (z − e i * +1 ) = v * (x − e i * ) ≤ v * (z).
• Pick arbitrary j such that j > i * and j < m. Suppose that the system moves to a state z such that z j +1 = x j , z j+1 = x j+1 +1, and z j = x j , ∀j / ∈ {j, j +1}. Since v * (x−e i * ) ≤ v * (x) and v * satisfies Property 12 (take l = 0 and q < w − 1), we should have v * (z) − v * (z − e i * ) ≥ v * (x) − v * (x − e i * ) ≥ 0. Thus it is not optimal to terminate any project in stage i * . • Suppose that the system moves to a state z such that z m + 1 = x m and z j = x j , ∀j < m.
Also, suppose that either i * < m or i * = m and x m ≥ 2. Since v * (x − e i * ) ≤ v * (x) and v * satisfies Property 12 (take l = 0 and w = m + 1), we should have v * (z) − v * (z − e i * ) ≥ v * (x) − v * (x − e i * ) ≥ 0. Thus it is not optimal to terminate any project in stage i * . Now suppose that i * = m and x m = 1. But then z i = 0, ∀i, i.e., there is no project in the system.
Therefore it is never optimal to terminate any project during the NPD process.

EC.4. Proofs of the Results in Section 5
In Section 5, as our optimization criterion, we take the average reward rate over an infinite planning horizon. Given a policy = , the average reward rate is given by The objective is to identify a policy * that yields v * (x) = sup v (x) for all states x. The following proposition shows that our optimal structural results in Sections 3 and 4 carry over to the average reward case: Proposition EC.1. Suppose that Assumption 1 (or Assumption 2) holds. Then there exists a stationary policy that is optimal under the average reward criterion. This policy retains all the properties of the optimal policy under the discounted reward criterion, as introduced in Theorem 1 (or Theorem 2). Also, the optimal average reward is finite and independent of the initial state; there exists a finite constant v * such that v * (x) = v * , ∀x.
Proof of Proposition EC.1. Suppose that Assumption 1 holds. We first prove the following conditions: (i) There exists a stationary policy that induces an irreducible positive recurrent Markov chain with finite average reward v , and (ii) the number of states for which −h(x) ≥ γ is finite for any negative value γ. Consider a policy where a new product idea is tested if and only if the number of projects in the system is less than a fixed threshold, and a new product is always launched from the most promising project available. Note that we have a finite-state Markov chain under this policy. Thus this policy yields a finite average reward, and condition (i) holds. Since the holding cost rate is increasing convex in the number of projects in the system, condition (ii) holds as well. Under conditions (i) and (ii), there exists a constant v * and a function f (x) such that f (x) + v * = µ 1 T 1 f (x) + µ 2 T 2 f (x) − h(x) (Weber and Stidham 1987). The stationary policy that maximizes the righthand side of the above equation for each state is an optimal policy in e-companion to Nadar et al. Optimal Portfolio Strategies for New Product Development ec29 the average reward case, yielding a constant average reward v * . Thus properties of the optimal policy in the average reward case are determined through f (x), whereas properties of the optimal policy in the discounted reward case are determined through v * (x). Since the same event operators appear in either case, the optimal policy for the average reward retains the same structure as in Theorem 1. Now suppose that Assumption 2 holds. We again prove conditions (i) and (ii). Consider a policy where a project is advanced to the next stage if and only if the number of projects in the next stage is less than a fixed threshold, projects are never terminated, and a new product is always launched if feasible. Again, we have a finite-state Markov chain under this policy. Thus this policy yields a finite average reward, and condition (i) holds. Since the holding cost rate is increasing convex in the number of projects in the system, condition (ii) holds as well. Hence there exists a constant v * and a function f (x) such that f ( The stationary policy that maximizes the righthand side of the above equation for each state is an optimal policy in the average reward case, yielding a constant average reward v * . Thus properties of the optimal policy in the average reward case are determined through f (x), whereas properties of the optimal policy in the discounted reward case are determined through v * (x). Since the same event operators appear in either case, the optimal policy for the average reward retains the same structure as in Theorem 2.
Computation times equal to 3600 seconds indicate termination of the algorithm.