Production planning and inventory control for a two-product recovery system

The significance of product recovery through remanufacturing has been widely recognized and has compelled manufacturers to incorporate product recovery activities into normal manufacturing processes. Consequently, increasing attention has been paid to production and inventory management of the product recovery system where demand is satisfied through either manufacturing brand-new products or remanufacturing returned products into new ones. In this work, we investigate a recovery system with two product types and two return flows. A periodic-review inventory problem is addressed in the two-product recovery system and an approximate dynamic programming approach is proposed to obtain production and recovery decisions. A single-period problem is first solved and the optimal solution is characterized by a multilevel threshold policy. For the multi-period problem, we show that the threshold levels of each period are solely dependent on the gradients of the cost-to-go function at points of interest after approximation. The gradients are estimated by an infinitesimal perturbation analysis–based method and a backward induction approach is then applied to derive the threshold levels of each period. Numerical experiments are conducted under different scenarios and the threshold policy is shown to outperform two other heuristic policies.


Introduction
In recent years, product recovery has emerged as an important field in reverse logistics and supply chain management, with increasing attention on the management of return flows in parallel with the conventional manufacturing process. Motivations behind the product recovery activities mainly include economic incentives and growing environmental concerns from society. Originally, engagement in recovery activities was often driven by government legislation that forced manufacturers to take responsibility for the whole product life cycle and waste reduction. However, direct and indirect economic benefits have now become the main reason, as raw material savings and reduction in disposal cost have gradually been recognized as significant. Furthermore, an environmentally friendly image also provides manufacturers a business advantage in the market.
Returns of used products complicate product recovery systems compared with traditional inventory systems as there are two options to fulfill customer demand: manufacturing brand-new products from virgin components by normal production and remanufacturing returned products into new ones by recovery. In other words, both virgin components and returns are available resources that can be used to make products. Thus, production planning and inventory management become a challenging issue for manufacturers as they need to coordinate between these two options in order to optimize the system's performance. In current practice, virgin components are usually purchased from upstream suppliers and returned products can be acquired through a variety of reverse channels (Pokharel and Mutha, 2009;Ilgin and Gupta, 2010). In some cases, manufacturers collect the returned products directly from customers, whereas in some other cases manufacturers rely on retailers to return their used products (Wojanowski et al., 2007). In other cases, it is also very common for manufacturers to outsource used-product collection activities to independent third parties who are solely engaged in collecting and reprocessing used products from the market, allowing the manufacturers to focus on their core business 0740-817X C 2015 "IIE" Optimal control for a two-product recovery system 1343 (Krumwiede and Sheu, 2002;Savaskan et al., 2004;Serrato et al., 2007).
Our work is motivated by a real case in which a manufacturer uses third parties as suppliers of returns for remanufacturing together with normal production. The manufacturer does not own any returned products and needs to acquire them from various third-party suppliers. Usually, the condition of returned products is highly variable, so different returns may require different responses and thus incur different remanufacturing costs (Blackburn et al., 2004;Zhou et al., 2011). The problem the manufacturer faces is how to jointly make remanufacturing decisions on different sources of returns and normal production decisions to minimize its total cost.
Considerable attention from academia has been paid to the study of product recovery; published papers are generally listed under the categories of reverse logistics and closed-loop supply chain. We refer readers to Prahinski and Kocabasoglu (2006), Pokharel and Mutha (2009), Ferguson and Souza (2010), and Ilgin and Gupta (2010) for complete reviews. Our work draws on and contributes to the primary stream of literature concerning the management of production and inventory of recovery systems that has been attracting growing research efforts. Many articles have been published that explore the structure of the optimal policy or propose a well-designed heuristic approach to draw near-optimal solutions (Thierry et al., 1995;Fleischmann et al., 1997;Ilgin and Gupta, 2010).
Product recovery systems can be classified by the number of return flows and demand flows. In practice, returns may be different in terms of type and quality condition, and thus fall into different groups. On the other hand, demand can also be divided based on different customer segments, service levels, etc. Specific recovery options are then applied to each different demand flow. For example, replacing a PC machine's hard disk or processor or adding some extra space memory or a modem may be reasonable ways to exploit most of the value a used machine still contains . Single-return-flow and singledemand-flow recovery systems have been widely studied, whereas there have only been a few studies on single-return-flow and multi-demand-flow recovery systems.
For recovery systems with one return and one demand flow, both deterministic and stochastic models have been developed. In deterministic models, the demand and return are known a priori for the entire planning horizon. Schrady (1967) extends the classic Economic Order Quantity (EOQ) model by considering return flow and develops a model on a system with constant demand and return rate. Similar models are proposed by Richter (1994Richter ( , 1996 and Teunter (2001). Koh et al. (2002) consider a joint EOQ and Economic Production Quantity (EPQ) model and assume a proportion of the used products are returned. Their approach is improved by Konstantaras and Papachristos (2008).
Continuous-and periodic-review policies are studied in stochastic models and can be further divided into two groups, with one considering a single aggregated stock point and the other distinguishing recoverable and serviceable inventories. Within the former class, Yuan and Cheung (1998) model dependent demand and return by assuming an exponentially distributed market sojourn time. Van der Laan and Teunter (2006) consider a recovery system controlled by certain extensions of the (s, Q) policy. The traditional (s, Q) model is extended into an order-disposal strategy to control a hybrid system by Ouyang and Zhu (2008). Some other researchers distinguish serviceable and recoverable inventory in their works. Different continuousreview policies for controlling serviceable and recoverable inventory are analyzed in  and van der Laan, Salomon, Dekker, and van Wassenhove (1999). Inderfurth and van der Laan (2001) further improve this model by including a modified inventory position. As for periodic-review policies, Simpson (1978) proposes an inventory model based on a fixed periodic-review policy and obtains the optimal solution structure for the multi-period problem. Inderfurth (1997) extends this model by considering the impact of non-zero lead times for production and recovery. Kiesmüler (2003) presents a method for the exact computation of the parameters that determine the optimal policy in Simpson (1978). DeCroix (2006) extends the studies of Simpson (1978) and Inderfurth (1997) by identifying the structure of the optimal remanufacturing/ordering/disposal policy. DeCroix et al. (2005) propose a stochastic model of a multistage system with stationary costs and stochastic demand over an infinite horizon. Ahiska and King (2010) discuss inventory optimization of a single-product stochastic system with two stocking points and find optimal inventory policies by using Markov decision processes.
A key assumption in the above cited publications on recovery systems is that only one remanufacturing option is considered, which indicates the number of demand flows is one. However, there exist different options of reusing returns in practice; e.g., the manufacturer may be required to make different products by recovery that are of different types or have different qualities. Some works have addressed this problem.  present a periodic-review model for product recovery in a system with returns of a single product and multiple alternating reuse options. A fairly simple near-optimal policy, characterized by a dispose-down-to level and a recover-up-to level for each option is obtained by taking advantage of a linear allocation rule on returns. Kleber et al. (2002) propose a continuous model of a recovery system with multiple recovery options, with each corresponding to a different demand class.
To the best of our knowledge, no work has been published to investigate a product recovery system that involves multiple demand flows or multiple return flows. This is due to this type of problem being significantly more complex than its single-demand-flow and single-return-flow counterparts. However, studies on systems with multiple streams of demands and returns is of practical importance. For example, returns of even the same product type may vary in quality condition. Furthermore, different providers of returned products may charge differently. Aras et al. (2004) assess the impact of quality-based categorization of returned products and show that incorporating returned product quality into the remanufacturing and disposal decisions can lead to significant cost savings. Behret and Korugan (2009) construct a simulation model to analyze the effect of quality classification of returns and indicate that quality-based classification of returns can result in significant cost savings, especially when return rates are high. Zhou et al. (2011) study an inventory system with multiple types of remanufacturable products and characterize a manufacturing-remanufacturing-disposal policy.
In this article we investigate a two-product recovery system, in which two types of products are demanded and two return flows are involved. Returns can be remanufactured into new products to satisfy two respective demand flows in parallel with normal production over a finite horizon. Our proposed periodic-review inventory model is used to determine the production quantities of each manufacturing/remanufacturing channel and characterize the inventory policy for the system with uncertain returns and demands. A dynamic programming model is developed to maximize the expected total profit and determine the optimal production and inventory decisions. Since the last period of multi-period problem actually forms a single-period problem, it is natural to investigate the single-period problem first. We prove the concavity of the single-period model and obtain global optimum by solving Karush-Kuhn-Tucker (KKT) conditions. The optimal solution contains 21 cases in total that are further analyzed and represented by a multilevel threshold policy. Although this threshold policy is not necessarily optimal for the multi-period problem, it is instructive and easy to extend. An approximate dynamic programming approach is then proposed for the multi-period problem and we show through approximation that the threshold levels of each period depend only on the gradients of the cost-to-go function at the points of interest to free us from assuming any explicit form for the cost-to-go function. The gradients are estimated by an Infinitesimal Perturbation Analysis (IPA)-based method and a backward induction approach is then conducted from the secondto-last period to the first period to sequentially derive the threshold policy of each period. The threshold policy is further experimentally proven to outperform two other heuristic policies under a wide range of settings and numerical examples are tested to assess the impact of uncertainty.
The rest of the article is organized as follows. Section 2 describes the two-product recovery system over a finite horizon. A dynamic programming model of the multiperiod problem is developed. Section 3 studies the system in a single period. An optimal multilevel threshold policy is obtained and managerial insights into the policy are pro-vided. In Section 4, we focus on the study of multi-period problem. The multilevel threshold policy of single-period system is extended and a backward induction approach is proposed to derive the threshold levels of each period. Numerical experiments are conducted in Section 5 to show the performance of our policy and the impact of uncertainties is investigated. We conclude our work in Section 6.

Problem description
A periodic-review two-product recovery system with a planning horizon of T periods indexed by t ∈ {1, . . . , T} is considered. Two product flows are provided to fulfill stochastic demand following independent stationary general distributions. Products can be either manufactured from virgin components or remanufactured from returned products and are assumed to be indistinguishable between manufacturing and remanufacturing. Two groups of returns, which can be acquired from the suppliers, are available to the manufacturer for remanufacturing. Each type of returns can be recovered to either of the two products and one unit of returns is remanufactured to one unit of new product. Returns of the same group are recovered at the same remanufacturing cost. Moreover, normal initial inventory of product j during period t s j : selling price of product j rc i j : remanufacturing cost of one unit of returned products from group i to produce j pc j : manufacturing cost for one unit of product j h j : holding cost for one unit of product j v j : penalty cost of one unit of shortage of product j R it : available returns in group i during period t {R it }: s e t o f r e t u r n s R it D jt : demand for product j during period t μ jt : the mean of demand for product j during period t f jt (·), F jt (·): pdf and cdf of the distribution of D jt for product j during period t F −1 jt (·): inverse function of F jt (·) Decision variables: p jt : amount of product j produced during period t r i jt : amount of returned products of {R it } recovered to produce product j during period t production is more costly than recovery, so normal production is used only when returns for recovery are insufficient. The capacity for normal production is unlimited. A simplified operation flow of the manufacturer at each period is described as follows. After observing the on-hand inventory, the manufacturer contacts the suppliers to check on the availability of the two groups of returned products. Based on the information about inventory and availability of returns, the manufacturer makes recovery and normal production decisions and then orders a certain amount of returned products of each group from the suppliers. Inventory is replenished instantly and is used to satisfy demand later in the same period. If demand cannot be fully satisfied, unsatisfied demand is lost forever and penalty cost on the shortage is incurred. Otherwise, remaining inventory is carried to subsequent periods and holding cost is counted. Revenue of the system is generated by selling new products and the total cost consists of manufacturing cost, remanufacturing cost, holding cost, and penalty cost. Since the manufacturer does not actually own the returns, it does not need to keep inventory for the returns. Moreover, we assume that the remanufacturing lead times for different groups of returns are identical and identical to the normal production lead time (see Inderfurth (1997), DeCroix (2006, and Zhou et al. (2011) for similar assumptions). Since the model under these assumptions is equivalent to the one with zero lead times, in this work we focus on the problem with zero lead times for ease of exposition.
The two products are denoted as product 1 and product 2 and the two groups of returns are denoted as group 1 and group 2. We use indexes i ∈ {1, 2} and j ∈ {1, 2} to distinguish between two groups of returns and two types of products, respectively. Parameters and decision variables are specified in Table 1. The structure of the two-product recovery system is depicted in Fig. 1 where index t is omitted for simplicity.
In addition, some restrictions on cost parameters are imposed to ensure the economic validity of this study. First, for each product, the selling price is higher than the manufacturing cost, and the penalty cost is higher than the profit from normal production: The manufacturing cost is higher than remanufacturing cost or else recovery is unnecessary: pc 1 > rc 11 , pc 1 > rc 21 , pc 2 > rc 12 , pc 2 > rc 22 . (2) Finally, without loss of generality, the recovery using returns of {R 2 } is assumed to be cheaper than using returns of {R 1 }: Therefore, to fulfill the demand, group 2 always has the highest priority to be selected to make products. Group 1 takes the second place and normal production is used only when returns in groups 1 and 2 are used up.

Model formulation
A dynamic programming model of the two-product recovery system is developed with the aim of maximizing the expected profit over a finite time horizon. In practice, once production and recovery decisions have been made, inventory gets replenished instantly and is used to satisfy product demands later in the same period. We define I 1t and I 2t as the inventory level after replenishment at period t and thus we have I 1t = I 1t + p 1t + r 11t + r 21t , I 2t = I 2t + p 2t + r 12t + r 22t . Therefore, the expected revenue of period t (denoted as E R t ) is formulated as follows: The total cost includes manufacturing cost, remanufacturing cost, holding cost, and penalty cost of shortage. Hence, the expected total cost of period t is formulated as Expected profit of period t is defined as EP t . It can be expressed as follows: The optimal cost-to-go function from period t until the final period is denoted as Q t (I 1t , I 2t ). For the multi-period problem, Bellman's equation of a dynamic programming model can be written as follows (t = 1, 2, . . . , T): Consumption of the returns of the two groups is no more than availability. Thus, Finally, all decision variables should be non-negative; i.e., Due to the curse of dimensionality, a dynamic programming problem involving more than two states is intractable, although for some problems where there is enough structure, the multidimensional dynamic programming problem can be solved. Unfortunately, our problem does not exhibit enough structure to avoid the curse of dimensionality because the state space of each state variable being too large and the probability distributions of the random variables are unknown. Thus, in our work, we seek to use an approximate dynamic programming technique to solve the problem. We will first investigate the single-period problem as a starting point and then apply the properties found to solve the multi-period dynamic programming model.

Solving the single period problem
At the beginning of a single period, information about returns and the initial inventory of the two products are known. Based on this information, production and recovery decisions are made to maximize the expected profit. As recovery decisions are subject to the availability of the returns, the model for the single-period problem is formulated as subject to where EP = s 1 μ 1 + s 2 μ 2 − pc 1 p 1 − pc 2 p 2 − rc 11 r 11 − rc 12 r 12 − rc 21 r 21 − rc 22 r 22 −h 1 and I j denotes the inventory level of each product type after replenishment: To simplify the expression, let L 1 ( p 1 , r 11 , r 21 ) and L 2 ( p 2 , r 12 , r 22 ) denote the terms related to stochastic demands for products 1 and 2, respectively; that is, By substituting Equation (13) into Equation (12), the objective function can then be rewritten as E P = s 1 μ 1 + s 2 μ 2 − pc 1 p 1 − pc 2 p 2 − rc 11 r 11 − rc 12 r 12 − rc 21 r 21 − rc 22 r 22 + L 1 ( p 1 , r 11 , r 21 ) + L 2 ( p 2 , r 12 , r 22 ).
As decision variables p 1 , p 2 , r 11 , r 12 , r 21 , r 22 act similarly in the functions, the first-order partial derivatives of L 1 and L 2 with respect to them are equal to each other: Therefore, we have Lemma 1. The objective function of the dynamic programming model (9)-(11) is jointly concave on all the decision variables for the single-period two-product recovery system if the salvage value of the unused returned products is ignored.
Proof. Lemma 1 holds if and only if the nonlinear part of the objective function-i.e., functions L 1 and L 2 -is concave on all of the decision variables. Take L 1 for example. The second-order partial derivative of function L 1 with respect to production decision p 1 is calculated as follows (let θ denote its value): The inequality above also holds for variables r 11 and r 12 . Thus, the concavity of L 1 on decision variables is then proven. Similarly, concavity of L 2 can also be proven in the same way. Therefore, Lemma 1 is proven.
The method of Lagrange multipliers is then applied to find the maximum of the model (9)-(11). The Lagrangian function (denoted as L) is expressed as follows: In order to obtain the optimal solution, we need to consider the KKT conditions for the maximum of L. Lemma 1 has shown the concavity of the objective function (9). In addition, constraint (10) is linear on decision variables.
Therefore, the solution to the KKT conditions is also the global maximum of the model. By solving these KKT conditions, we can then obtain the optimal production and recovery decisions for the two products. The values of the decision variables are dependent on the availability of each resource, initial inventory position, and cost structure. Under a certain cost structure, the optimal solution of the model usually contains a number of cases given different inventories and returns. After analyzing the optimal solution cases, we realize that they can be represented by an optimal multilevel threshold policy that is characterized by a set of threshold levels. Therefore, we can employ these threshold levels to help make optimal decisions once they are determined. In the following, we will explain how to determine the multilevel threshold policy and make optimal decisions with its help.

Optimal multilevel threshold policy
There exist three channels to make a product; i.e., normal production and recovery from returns of {R 1 } or {R 2 }. To fulfill the demand and maximize the expected profit, we need to decide when to choose each channel and how many units of each product to make from each channel. In other words, recovery from returns provides alternative options to make products in the product recovery system. Resources of each channel need to be sequentially allocated between two products based on the cost structure.
As different cost structures usually result in different forms of production and recovery decisions, we choose one structure as an example to illustrate how the optimal solution is obtained. For other cost structures, the problem can be solved by adapting the modeling and solution process presented here. Note that we have already made assumptions (2) and (3) in Equations (2) and (3), which indicate that among the three channels, the sequence of being selected for replenishment is recovery from {R 2 }, recovery from {R 1 }, and normal production. Furthermore, to simplify the illustration we impose the following restrictions on the cost parameters: For instance, the first inequality above indicates that replenishment for one unit of product 1 from {R 1 } instead of normal production brings more cost saving than replenishment for one unit of product 2 from {R 1 } instead of normal production. Therefore, the three inequalities in Equation (20) indicate that if returns from {R 2 } or {R 1 } are not enough, the manufacturer will allocate them to product 1 rather than product 2. At the beginning of the period, information about initial inventories and the availability of resources in each channel is known. Each resource is then sequentially allocated between two products until the corresponding resource is used up or the amount of the product reaches a certain threshold level beyond which any further replenishment is not economically worthwhile any longer (i.e., order-upto level) or replenishment for the other product becomes more profitable (i.e., switching level) so that the replenishment should switch. These threshold levels are used to control the process of resource allocation. Under the given cost structure, the optimal solution has 21 cases in total based on the different combinations of initial inventories and amount of returns. Details of the 21 cases are given in Appendix A1.
Among these threshold levels, there are three order-upto levels for each product corresponding to three different resources. The order-up-to level is the highest inventory level when the product is replenished by a certain resource. It means any further replenishment after the order-up-to level is not profitable and the replenishment should stop. These order-up-to levels can be obtained by solving the related KKT conditions as follows.
1. Order-up-to level by production.
For each product, the order-up-to level by production is defined as the maximum inventory level by normal production. At this order-up-to level, the marginal profit of further replenishment is equal to zero. Let AL 0 and BL 0 denote the order-up-to levels by production for products 1 and 2, respectively. We have Similarly, for product 2, the order-up-to level by production can be determined as 2. Order-up-to level by group 1. For each product, the order-up-to level by group 1 is defined as the maximum inventory level by replenishment using returns of {R 1 }. If they are enough for the recovery of two products, their inventories will be replenished until the order-up-to levels (denoted as AL 1 and BL 1 ), at which the marginal profits of further replenishment are equal to zero. Thus, we have 3. Order-up-to level by group 2. Similarly, for each product, the order-up-to levels by group 2 (denoted as AL 2 and BL 2 ) are defined as the maximum inventory level by replenishment using {R 2 }. Thus, we have If there is only one resource available, the above order-up-to level is the highest level to which the resource can replenish the inventory of a product to. Since {R 2 } is the cheapest resource, {R 1 } is the second and normal production is more expensive than recovery, according to Equations (21) to (24), we have AL 2 > AL 1 > AL 0 , BL 2 > BL 1 > BL 0 . The replenishment procedure first chooses {R 2 } to make products. If the returns of {R 2 } are enough to replenish products 1 and 2 to more than AL 1 and BL 1 , recovery from {R 1 } and normal production will be unnecessary because replenishment by these two channels is not profitable. If the inventory levels of the products by recovery from {R 2 } are between AL 0 (BL 0 ) and AL 1 (BL 1 ) after {R 2 } is used up, {R 1 } will be used, since the inventory levels have not reached their order-up-to levels and normal production is avoided. If the inventory levels are still lower than AL 0 and BL 0 after {R 2 } and {R 1 } is used up, normal production will be needed to replenish the inventories. Since two products are considered, if a certain channel is not able to replenish the inventories of both products to their corresponding order-up-to levels, they will compete for the resource. Therefore, in addition to the order-up-to levels, three switching levels, denoted as SW 1 , SW 2 , and RP, are used to control the allocation of possibly limited resources between the two products. Allocation of a limited resource involves the comparison of marginal profits with respect to each channel. Returns of {R 2 } or {R 1 } will be assigned to the product with the highest marginal profit which means replenishment for this product is more profitable than the other one under the current situation. Thus, the inventory of the product with the highest marginal profit increases and its marginal profit is decreasing at the same time. On the other hand, the inventory of the other product remains unchanged. The recovery process continues until the two products have equal marginal profits. This is when the recovery process switches from one product to the other since the marginal profit of the product is no longer higher than the other. After this, the recovery process simultaneously allocates the resource between two products and maintains the equality of the two marginal profits until the resource is used up or the inventories reach their corresponding order-up-to levels. This type of allocation policy is defined as the fair allocation rule, which aims to balance the marginal profits of the two products during replenishment by a recovery channel. The introduction of the switching levels helps determine at which point recovery switches from one product to the other and the process of fair allocation begins.
In the following, we will illustrate how to determine the switching levels in detail by using the "replacement" concept. Suppose that the initial inventories of products 1 and 2 are below AL 0 and BL 0 and normal production is the only channel. Due to an unlimited production capacity, products 1 and 2 can be replenished to AL 0 and BL 0 . When the two recovery resources become available, they will be sequentially used to replace the products, which are originally made by normal production. Since the remanufacturing cost of {R 2 } is lower than {R 1 }, {R 2 } will be used first. Details of these switching levels are further explained as in the following: 4. Switching level SW 2 .
The switching level SW 2 is the inventory level of product 1 and corresponds to the inventory level BL 0 of product 2. It denotes the inventory level of product 1 at which the recovery process from group 2 switches from product 1 to product 2. When recovery resources are available, {R 2 } will first be used to replace the new items of product 1 by normal production. During this replacement process, the recovery amount of product 1 from {R 2 }; i.e., r 21 is increasing and the amount by normal production-i.e., p 1 -decreases. In other words, a portion of the new items originally completed by normal production is now made by {R 2 } instead. Before the new items produced by normal production are all replaced, its inventory-i.e., I 1 -remains unchanged. Therefore, according to Equations (17) and (21), the marginal profit with respect to r 21 (denoted as ∂EP/∂r 21 | I 1 ) does not change and we have ∂EP/∂r 21 | I 1 = AL 0 = pc 1 − rc 21 . On the other hand, the inventory of product 2-i.e., I 2 -also stays unchanged and the marginal profit with respect to r 22 at BL 0 (denoted as ∂EP/∂r 22 | I 2 =BL 0 ) is pc 2 − rc 22 . According to Equation (20), we have Therefore, if {R 2 } is sufficient to replace all new items of product 1, the replenishment process will continue to make product 1. r 21 and I 1 will increase at the same time. ∂EP/∂r 21 | I 1 decreases and the process continues until I 1 reaches a certain level-i.e., SW 2 -at which ∂EP/∂r 21 | I 1 =SW 2 is equal to ∂EP/∂r 22 | I 2 =BL 0 . After that, the process will switch to replenish product 2 in order to replace the new items originally made by normal production. During this replacement process, r 22 is increasing and p 2 decreases. I 2 remains unchanged and ∂EP/∂r 22 | I 2 does not change. If there are still items of {R 2 } left after all new items of product 2 by normal production have been replaced, the remaining {R 2 } will be simultaneously allocated between two products following the fair allocation rule. Therefore, when the inventory levels of products 1 and 2 are at SW 2 and BL 0 , respectively, the two products have equal marginal profits with respect to the recovery amount from {R 2 }. Thus, we have ∂EP ∂r *

21
5. Switching level SW 1 . The switching level SW 1 is also the inventory level of product 1, which corresponds to the inventory level BL 0 of product 2. It denotes the inventory level of product 1, at which recovery process from group 1 switches from product 1 to product 2. When the inventory levels of products 1 and 2 are at SW 1 and BL 0 , respectively, the two products have equal marginal profits with respect to the recovery amount from {R 1 }. Thus, we have ∂EP ∂r *

11
6. Switching level RP. The switching level RP is the inventory level of product 1 that corresponds to the inventory level BL 1 of product 2. It denotes the inventory level of product 1 at which recovery process from group 1 switches from product 1 to product 2. When the inventory levels of products 1 and 2 are at RP and BL 1 , the two products will have equal marginal profits from recovering the returns of {R 2 }. Thus, we have ∂EP ∂r *

21
All threshold levels of products 1 and 2 are illustrated in Fig. 2. Order-up-to levels AL 1 , AL 2 , BL 1 , and BL 2 are the highest replenishment levels for products 1 and 2 when recovery resources are used. AL 0 and BL 0 are the highest replenishment levels when normal production is used and they are also the lowest replenishment levels, as production capacity is unlimited. Among the order-up-to levels of product 1 (product 2), AL 2 (BL 2 ) is the highest, whereas AL 0 (BL 0 ) is the lowest. AL 1 (BL 1 ) locates between them. SW 1 , SW 2 , and RP are involved to control the allocation of recovery resources. SW 1 is between AL 0 and AL 1 , whereas RP is between AL 1 and AL 2 . SW 2 is between SW 1 and RP. In Fig. 2, SW 2 is shown to be lower than AL 1 . However, SW 2 may be higher than AL 1 under other cost structures.

1350
Pan et al.

Recovery system over a finite horizon
In the course of solving the single-period problem, an optimal multilevel threshold policy, which is characterized by six order-up-to levels and three switching levels, has been obtained to help make optimal production and inventory decisions. In addition, it can guide us to address the dynamic programming model of the multi-period problem. We assume that this threshold policy is also applicable to the multi-period problem. As the last period of the multiperiod problem is actually a single-period problem, we can start from the last period and take advantage of the threshold levels previously obtained. In the following, an approximate dynamic programming approach is applied and a backward induction method, which is conducted from the last period to the first period, is used to determine the threshold levels.

An approximate dynamic programming approach
Recall that Bellman's equation of dynamic programming is We cannot expect to solve the above cost-to-go function to optimality. However, we assume that the multilevel threshold policy can also be obtained here and nine threshold levels can be determined to help make decisions for the multi-period problem. In order to solve Equation (28), we take a first-order partial derivative of it with respect to each decision variable. For example, to decide p 1t , its first-order partial derivatives are expressed as According to Equation (17), the first term of Equation (29) can be calculated as ∂EP t ∂ p 1t = −pc 1 + s 1 + v 1 − (s 1 + v 1 + h 1 )F 1t (I 1t ). (30) Since there is no closed-form solution for E [Q t + 1 (I 1,t + 1 , I 2,t+1 )], we approximate its partial derivative with respect to p 1t as the gradient of the inventory level at period t (denoted as u 1t (I 1t , I 2t )): Let the approximation of partial derivative (29) be equal to zero. Thus, we have By solving Equation (32), we can obtain the inventory level of product 1 when normal production is involved. This inventory level is called the order-up-to level, which was defined in Section 3. The order-up-to level of product 1 by normal production at period t is calculated as Similarly, the other threshold levels of period t (i.e., AL 1t , AL 2t , BL 0t , BL 1t , BL 2t , SW 1t , SW 2t , RP t ) can be determined in the same way. The results are presented in Appendix A2. It can be seen that the threshold levels of the multiperiod problem are only dependent on the gradients of the cost-to-go function at the corresponding threshold levels after approximation. Therefore, unlike usual approaches that represent the cost-to-go function by a single function (or piecewise function) across the whole state space, we only need to estimate the gradients at the points of interest. Hence, the performance of the solution is not affected by the function we assume, which can pose a challenge for most approximate dynamic programming approaches.
If the cost-to-go function is separable with the inventory levels of the two products, the gradient u 1t is independent of I 2t . However, this is not always possible. We have performed some numerical experiments on a number of samples and the results show that the position of I 2t does not have a major impact on u 1t . Therefore, we use the average of the gradients at the three order-up-to levels of product 2 as the approximation of u 1t at AL 0t . Thus, in our approach, u 1t is estimated as In addition, to determine a threshold level, say AL 0t , the gradient at AL 0t needs to be obtained first. However, the gradient at AL 0t can be computed only after AL 0t is decided. Therefore, an iterative algorithm is employed to search for the threshold levels starting with the predetermined threshold levels of period t + 1. The threshold levels of t + 1 are obtained in advance through a backward induction approach that is elaborated in next subsection.
Initially, u 1t is obtained by taking predetermined threshold levels of t + 1 as initial values; i.e., I 1t = AL 0,t+1 , I 2t = BL l,t+1 (l = 0, 1, 2). u 1t is then smoothed with the previous value to get a weighted average gradient, which is used to calculate the corresponding threshold level. After that, the newest threshold level is also smoothed by certain rule to obtain a weighted average. The procedure is repeated until it converges. Due to the time-consuming computation, the algorithm generally stops if the total absolute change of the threshold level over a certain number of iterations is small. For example, if m i =m−L+1,m>L ||AL i 0t − AL i −1 0t || < δ (L: the number of consecutive results used for calculating the absolute change; δ: a small number), the algorithm steps. In addition, as sampling is involved, stochastic gradients can be quite different from iteration to iteration. The instability of stochastic gradients makes the solution obtained in Step 5 fluctuating. Thus, we need the averaging steps-i.e., Steps 4 and 6-to help stabilize the solution. According to Gupal and Bazhenov (1972), the solution converges to an optimal solution under certain conditions on the step sizes α m and β m , such as The procedure for determining AL 0t is summarized below. All the other order-up-to levels can be obtained in a similar way: Algorithm 1 Procedure of determining order-up-to levels.
Step 1. Let m be the index for an iteration, u m 1t be a stochastic gradient obtained at iteration m, u 1t m be a weighted average of gradient up to iteration m, AL 0t m be the threshold level corresponding to a weighted average of gradient at iteration m, and AL m 0t be a weighted average of the threshold level at iteration m.
Step 5. Obtain AL 0t m by Step 6. Update AL m 0t by Step 7. Set m = m + 1. If the stopping criterion is not met, go to Step 3.

Determination of gradients in the multi-period context
To solve the multi-period problem, we start from period T and then take advantage of a backward induction approach to determine the threshold levels of each period from the second-to-last period; i.e., period T − 1 until period 1. Since the threshold levels of each period, except period T, are actually dependent on their corresponding gradients, they can be obtained by determining the gradients at points of interest through the backward induction approach from period T − 1 to period 1. Threshold levels of period T are obtained by solving a single-period problem. Then, for period T − 1, the threshold levels can be determined after the gradients of the optimal cost-to-go function-i.e., the gradients of the maximum expected profit of period T-have been estimated. Suppose we are now at period t. Up to now, the threshold levels from period t + 1 until T have been decided and replenishment decisions have been made. In order to determine its threshold levels, we need to estimate the gradients of the optimal cost-to-go function; i.e., the gradients of maximum expected total profit earned from period t + 1 until T. After that, replenishment decisions are made under the optimal policy and the algorithm moves to t − 1.
The gradients u jt are estimated by Monte Carlo simulation. N sets of random realizations of stochastic returns and demands in each period from t until T are generated. By looking into the realization of the return at each period, we can tell which one out of the 21 cases presented in Appendix A1 it matches and then calculate the derivatives of the optimal replenishment decisions as well as optimal cost-to-go function of period t together with the realization of the demand. The value of the optimal cost-to-go function of sample k is obtained by summing the profit of the realization from period t + 1 until period T − 1 after applying the optimal policy for these periods and the expected profit of period T. Let P k * τ denote the maximum profit of period τ of sample k, which is ([X] + := max{X, 0}): Let EP k * T denote the maximum expected profit of period T of sample k. Thus, the gradient with respect to p 1t of sample k (denoted as gr ad k 1t (I 1t , I 2t )) is expressed as ∂ I jt gr ad k 1,t+1 (I * 1,t+1 , I * 2,t+1 ) ∂ I jt gr ad k 2,t+1 (I * 1,t+1 , I * 2,t+1 ) Since the sample gradients gr ad k 1,t+1 (I * 1,t+1 , I * 2,t+1 ) and gr ad k 2,t+1 (I * 1,t+1 , I * 2,t+1 ) are computed during the process of determining the sample gradients of period t + 1, gr ad k jt (I 1t , I 2t ) can be easily calculated. Through this backward induction approach, the sample gradients at orderup-to levels of each period are obtained period by period in reverse order from period T − 1 to period 1. The gradient u jt is approximated by sample average of the gradients over all N realizations. As I 1t and I 2t are assumed to be independent of each other, the perturbation of I jt is only propagated to I j,t+1 if there is no shortage of product j at period t. Then, Equation (40) can be further expressed as ∂ I j,t+1 gr ad k 1,t+1 (I * 1,t+1 , I * 2,t+1 ) ∂ I j,t+1 gr ad k 2,t+1 (I * 1,t+1 , I * 2,t+1 ) Partial derivatives of the optimal replenishment decisions with respect to the initial inventory are involved in the calculation of Equation (41). Therefore, IPA is applied to compute the derivatives. The perturbation of the initial inventory of product j of period t + 1 is propagated to the final inventory of product i in the same period (i, j = 1, 2; i = j ): Suppose the current condition at period t + 1 matches case S7 in Appendix A1, which is R k 2,t+1 + I 1,t+1 < SW 1,t+1 , R k 1,t+1 + R k 2,t+1 + I 1,t+1 > SW 1,t+1 , Then, the optimal replenishment decisions for sample k are as follows: p * 1,t+1 = 0, r * 11,t+1 = SW 1,t+1 − R k 2,t+1 − I 1,t+1 , r * 21,t+1 = R k 2,t+1 , p * 2,t+1 = SW 1,t+1 + BL 0,t+1 − R k 1,t+1 − R k 2,t+1 − I 1,t+1 − I 2,t+1 , r * 12,t+1 = R k 1,t+1 + R k 2,t+1 + I 1,t+1 − SW 1,t+1 , r * 22,t+1 = 0, and I * 1,t+1 = SW 1,t+1 , I * 2,t+1 = BL 0,t+1 .
We can tell from Equation (44) that the inventory levels of the two products after replenishment have reached the threshold levels SW 1,t+1 and BL 0,t+1 , respectively. Therefore, the perturbation on the initial inventory of the two products is not propagated to the order-up-to levels of the two products; that is, However, the perturbation affects the related replenishment decisions and the impact can be determined as Thus, we conclude that if the initial inventory of product 1 is increased by a small amount , the same amount of normal production for product 2 will be saved and the recovery amount from R 1 for product 2 will be increased by the same amount . Despite the impact on related replenishment decisions, the order-up-to levels of each product remain unaffected.

Numerical experiments
Extensive numerical experiments are conducted to validate the effectiveness of the multilevel threshold policy. Our approach to solve the two-product recovery problem over a finite horizon is run on a PC (Intel Core 2 Duo, 2.66 GHz; Memory, 4G). Replications of stochastic demands and returns are generated by assuming both of demands and returns follow normal distributions in our experiments. Let E(X) and StDev(X) denote the expectation and standard deviation of X, respectively. Note that while deriving the threshold levels of each period we do not need to assume the stochastic demands and returns follow any distribution.   We perform some pre-experiments to verify the convergence of the threshold levels obtained from our solution approach. Once the differences of all of the threshold levels between two consecutive periods are no more than 1%, convergence is assumed to have been achieved. Our pre-experiments show that under a certain cost structure, the threshold levels converge faster when the holding cost is higher. Therefore, in practice, the converged threshold levels can be applied while making production and recovery decisions of each period over a relatively long finite horizon.

Impact of stochastic returns and demands
We investigate the impact of stochastic returns and demands on the threshold levels, which have converged in the multiperiod context. We first assess how the expected value of returned items affects the threshold levels. Second, we evaluate the impact of demand variability on the threshold levels. Experiments are conducted on three sets of system parameters, which are shown in Table 2.

Impact of stochastic returns on threshold levels
Eight instances are generated and tested for each set of system parameters. Among the eight instances, demand for products 1 and 2 and returns of product 2 are the same. They are generated according to the following parameters: Furthermore, eight sets of returns of group 1, one for each of the eight instances, are generated with E(R 1 ) varying from 15 to 210 and StDev(R 1 ) from five to 70, respectively.
Threshold levels for the two products obtained on three test sets are shown in Tables 3, 4, and 5, where the first and second columns present eight scenarios of the returns of group 1 and the next nine columns report the converged threshold levels of each scenario. Furthermore, the trends of the threshold levels with each parameter set are illustrated in Figs. 3, 4, and 5. The results show that for the three parameter sets, all threshold levels decrease with the increasing expected value of R 1 . For product 1, the threshold levels AL 1 , AL 2 , and RP decrease faster than other threshold levels. And for product 2, the threshold levels BL 1 and BL 2 decrease faster than BL 0 .
Threshold levels of each period decrease when there are more returns available. As R 1 is involved in recovery process, it affects the threshold levels related to recovery; i.e., AL 1 , AL 2 , BL 1 , and BL 2 . On the other hand, AL 0 and BL 0 are not affected by R 1 . Consequently, R 1 has a significant impact on RP, which is obtained by the comparison of marginal profits of the recovery using R 2 while the inventory of product 2 is at the threshold level BL 1 . However, SW 1 and SW 2 are obtained by the comparison of marginal profits of the recovery using R 1 and R 2 while the inventory of product 2 at the threshold level BL 0 is only slightly impacted by R 1 .

Impact of demand variability on threshold levels
Six instances are generated and tested for each set of parameters. Among the six instances, demand of product 2 and returns of products 1 and 2 are the same. They are generated according to the following parameters: E(D 2 ) = 100, StDev(D 2 ) = 30; E(R 1 ) = 90, StDev(R 1 ) = 30; E(R 2 ) = 45, StDev(R 2 ) = 15. Furthermore, six sets of the demand for product 1, one for each of the six instances, are generated with StDev(D 1 ) varying from 20 to 200 and the coefficient of variation of D 1 (denoted as CV 1 ) from 0.1 to 1.0, respectively, where E(D 1 ) is fixed at 200. Threshold levels of the two products obtained on three test sets are shown in Tables 6, 7, and 8, where the first and second columns present six scenarios of the demand for product 1 and the next nine columns report the converged threshold levels of each scenario. Furthermore, the trends of the threshold levels with each test set are illustrated in Figs. 6, 7, and 8. The results show that for the three test sets, all threshold levels related to product 1 increase with the increasing demand variability of product 1, whereas the threshold levels related to product 2 stay unaffected. As the demands for the two products are independent of each other, the impact of demand variability of product 1 will only affect the threshold levels related to product 1. Furthermore, higher demand variability results in higher threshold levels, so that possible stock shortage can be avoided. Note that in some cases the threshold level does not follow exact increasing trend; e.g., AL 0 in Fig. 8. This is because when the demand variability reaches a very high level, there is high chance that values of some demand realizations may be negative and thus need to be adjusted to zero to ensure the meaningfulness of the experiments. As a result, the expected demand increases but the demand variability drops and may be less than its previous scenario. Then, the corresponding threshold level may be lower than the previous one.

Comparison with other heuristic policies
To assess the benefit of our proposed approach, we compare it with two heuristic methods, both of which consider each period of the multi-period problem as a series of single-period problems and solve them to optimality period by period. The first threshold policy, denoted as H 1 , is derived from the first method and disregards scrap value    of remaining products, and the second threshold policy, denoted as H 2 , is derived from the second method and assumes the scrap value of remaining products 1 and 2 to be rc 21 and rc 22 , respectively. We denote our threshold policy as H 3 and average expected profit over all periods of each method as AE P H1 , AE P H2 , and AE P H3 , respectively. The performance of the three policies is experimentally compared in the last scenario of Table 3, where E(R 1 ) = 210 and StDev(R 1 ) = 70. The threshold levels of the three poli-cies are shown in Table 9. It can be seen that the threshold levels of H 3 are the highest, whereas the threshold levels of the H 1 are the lowest. The difference in corresponding threshold levels between H 1 and H 2 is small, whereas the difference between H 1 and H 3 is relatively large.
The three methods help make production and recovery decisions of the two-product recovery system in a relatively long horizon, and average expected profit over all the periods is also calculated and reported in Table 10.    Our method outperforms H 1 by 7.19% and H 2 by 3.35%. The threshold levels obtained by our approach perform the best among three policies. Since the optimal solution to the multi-period two-product recovery problem is difficult to obtain, our approach provides a promising way to obtain a good solution.

Conclusions
In this study, we consider a two-product recovery system over a finite horizon. Based on the multilevel threshold policy derived from solving the single-period problem, an approximate dynamic programming approach is proposed to solve the multi-period problem, which only needs to estimate the gradients of the cost-to-go function at points of interest and frees us from assuming a form for the cost-to-go function. Numerical experiments are conducted to validate the proposed approach. Compared with existing researches in related areas, this work contributes the following: 1. Our article considers production planning and inventory control of a recovery system with two products and two respective return flows, which, as far as we know, is the first study on recovery systems involving multiple products and multiple returns. 2. A single-period two-product recovery problem is solved to optimality and a multilevel threshold policy is derived to help make production and inventory decisions. This threshold policy serves as a good starting point for solving the multi-period problem, 3. An approximate dynamic programming approach is used to address the multi-period two-product recovery problem. We show that the threshold levels of each period only depend on the gradients of cost-to-go function at the points of interest through a simple approximation. Therefore, we do not need to assume any explicit form of the cost-to-go function, which can be a challenge in most approximate dynamic programming approaches. 4. To solve the multi-period problem, it is shown that the gradients of each period can be sequentially calculated period by period through a backward induction approach starting from the last period and then moving backwards to the first period.
We note that the multilevel threshold policy and the proposed solution approach are also applicable to general multi-product recovery systems. Detailed discussion is provided in Appendix A3. However, there are limitations of the current study. First, we assume that there is no setup cost and backorders are not allowed, both of which are commonly seen in industry. Also, due to the high complexity of the studied multi-period two-product recovery system, the sub-optimality of the proposed approach needs to be explored in the future.

A2: Threshold levels of the multi-period problem
All the threshold levels are listed in Table A2

A3: The extension to a general multi-product recovery system
After studying the two-product recovery system, we extend our work to a general multi-product recovery system. The structure of a multi-period system is depicted in Fig. A1. Similar to the two-product problem, some restrictions on cost parameters are also imposed to ensure the economic validity of the study on the N-product recovery system. First, for each product, the selling price is higher than the production cost, and the penalty cost of shortage is higher than the profit from production. Therefore, s j > pc j and v j > s j − pc j ( j = 1, 2, . . . , N). Second, for each product, the production cost is higher than recovery cost. Otherwise, recovery is unnecessary. Therefore, pc j > rc 1 j and pc j > rc 2 j ( j = 1, 2, . . . , N). Lastly, the recovery using returns in group 2 is cheaper than that using returns in group 1. Therefore, rc 2 j < rc 1 j ( j = 1, 2, . . . , N). The N-product recovery system is first studied in a single period by referring to the two-product recovery system. Similarly, the optimal solution can be found by solving the KKT conditions of a Lagrangian function. The optimal solution to the single-period problem on the two-product recovery system has been shown in Appendix A1 based on a choice of cost structure: pc 1 − rc 11 > pc 2 − rc 12 , pc 1 − rc 21 > pc 2 − rc 22 , and rc 11 − rc 21 > rc 12 − rc 22 . For the N-product recovery system, the optimal solution can be similarly shown based on the cost structure ( j = 1, 2, . . . , N − 1): pc j − rc 1 j > pc j +1 − rc 1, j +1 , pc j − rc 2 j > pc j +1 − rc 2, j +1 and rc 1 j − rc 2 j > rc 1, j +1 − rc 2, j +1 . For the N-product recovery system, there are three order-up-to levels for each product with respect to the three replenishment sources. Based on the selected cost structure, there are three additional switching levels to control the interactive allocation of possibly limited recovery sources between product j and product j + 1. Therefore, the total number of threshold levels for the N-product recovery system is calculated as 3 × N + 3 × (N − 1) = 6 × N − 3. Although the complexity of the optimal solution will increase with the number of products in the recovery system, the N-product case can still be formulated as a nonlinear programming problem and solved using the KKT conditions. When the recovery system is extended from the single-period context to the multi-period problem, we can still consider the use of the multi-level threshold policy, as it is easy to implement. For example, for an M-period problem of an N-product recovery system, the threshold levels of each period can be determined by referring to the work on the two-product recovery system. However, the computational complexity of the heuristic algorithm would increase with the number of products in the system.