A cross-decomposition scheme with integrated primal–dual multi-cuts for two-stage stochastic programming investment planning problems

We describe a cross-decomposition algorithm that combines Benders and scenario-based Lagrangean decomposition for two-stage stochastic programming investment planning problems with complete recourse, where the first-stage variables are mixed-integer and the second-stage variables are continuous. The algorithm is a novel cross-decomposition scheme and fully integrates primal and dual information in terms of primal–dual multi-cuts added to the Benders and the Lagrangean master problems for each scenario. The potential benefits of the cross-decomposition scheme are demonstrated with numerical experiments on a number of instances of a facility location problem under disruptions. In the original formulation, where the underlying LP relaxation is weak, the cross-decomposition method outperforms multi-cut Benders decomposition. If the formulation is improved with the addition of tightening constraints, the performance of both decomposition methods improves but cross-decomposition clearly remains the best method for large-scale problems.


Motivation
Two-stage stochastic programming investment planning problems [5] can be hard to solve since the resulting deterministic equivalent programs can lead to very largescale problems. There are two main approaches, which are complementary techniques to address the computational challenge: First, sampling methods [34] and scenario reduction techniques [22] can be used to limit the number of scenarios. Second, decomposition schemes, such as Benders decomposition also known as L-shaped method [3,17,44] or Lagrangean decomposition [9,20] can be applied to exploit decomposable problem structure. In the following we focus on decomposition schemes for solving two-stage stochastic programming problems.
Benders decomposition solves the original problem with iterations between a master problem and subproblems, which are obtained by fixing complicating variables, the investment decisions. Dual information from the subproblems, which describes the sensitivity of the second-stage decisions with respect to the first-stage decisions, is fed back to the master problem, which in turn guides the primal search. The convergence rate of Benders decomposition is good if the underlying relaxation is tight [35,39]. However, initially the bound provided by the Benders master problem might be weak and a large number of iterations are potentially required. Therefore, tightening techniques to speed-up the convergence of Benders, such as multi-cuts [4], Pareto optimal cuts [35] and cut bundle generation methods [38] (among others) have been proposed.
Lagrangean decomposition applied to stochastic programming problems is a special form of Lagrangean relaxation, which relaxes the so-called non-anticipativity constraints that enforce the same first-stage decisions across scenarios [9]. Therefore, Lagrangean decomposition provides a good relaxation since the Lagrangean dual is very similar to the original problem. However, the classical Lagrangean decomposition approach suffers from two weaknesses. First, it might be hard to generate good first-stage feasible solutions from the Lagrangean dual subproblems through heuristics. Second, the update of the multipliers by subgradient optimization [13,23,24] or cutting planes [11,30] can be a bottleneck that slows down the overall convergence of the algorithm. Techniques to speed up convergence of the multiplier update include the bundle method [31,32,46], the volume algorithm [2] and the analytic center cutting plane method [18]. There are also strategies to combine bounds from subgradients with cutting planes [36,37], and to update the multipliers based on dual sensitivity analysis [21,42].
However, while there are many papers that focus on improvements for one of the two decomposition schemes, it seems that only few research efforts address combining their complementary strengths. First proposed by Van Roy [43], the cross-decomposition algorithm is a framework that unifies Benders and Lagrangean relaxation, which can be seen as duals of each other. The cross-decomposition algorithm iterates between the primal and dual subproblems, where each of the subproblems yields the input for the other one. Convergence tests are used to determine when the iteration between primal and dual problems needs to be augmented with the solution of a primal or dual master problem in order to prevent cycling of the algorithm and restore convergence. Holmberg [25] generalizes the idea of cross-decomposition and introduces a set of enhanced convergence tests. One of the main ideas in cross-decomposition is to avoid solving the master problems since the solution of these problems, potentially MIPs, is seen as a hard task. Some variations of the method, e.g. mean value cross-decomposition [26,28] in which a new solution is obtained by averaging over a set of solutions from previous iterations, even completely eliminate the use of master problems at the cost of potentially slow convergence.
To the best of our knowledge, there is almost no work for stochastic programming problems that uses cross-decomposition. 1 On the one hand, Cerisola et al. [10] use dual information obtained from a component-based Lagrangean dual relaxation in a nested Benders approach for the unit commitment problem. On the other hand, Sohn et al. [41] derive a mean value cross-decomposition approach for two-stage stochastic programming problems (LP) based on Holmberg [27], in which the use of any master problem is eliminated and apply the algorithm to a set of random problem instances. They claim to solve the instances faster than Benders and ordinary crossdecomposition. But from their presentation it is hard to judge to what extent their algorithm might avoid cycling behavior.
At the same time, there are two paradigm shifts that have occurred over the last 20 years that influence the way we might perceive cross-decomposition: 1. MIP solver improvements and faster computers lead to significantly lower CPU times for MIP problems. 2 Therefore, there is no longer the need to avoid solving master problems in cross-decomposition schemes. While the Benders (primal) master problems is a potentially hard MIP problem, it needs to be solved anyway in the regular Benders approach. Furthermore, the Lagrangean dual master problem is an LP or QP (with stabilization) if cutting planes are employed. Hence, we would like to solve both master problems to obtain primal and dual guidance on our search with information obtained from both subproblems. 2. Growing grid computing infrastructure is leading to even more parallelization.
Hence, it is desirable to use these computing resources to compute strong lower and upper bounds that rely on parallelization. Benders primal subproblems and Lagrangean dual subproblems are both well-suited for solution in parallel.
Hence, in this paper, we describe a novel cross-decomposition scheme for two-stage stochastic investment planning programs with complete recourse, in which first-stage investment decisions are mixed-integer and second-stage decisions are continuous. Once first-stage decisions are fixed, the problem decomposes into scenarios (Benders primal subproblems). The Lagrangean dual subproblem for each scenario is obtained by dualizing non-anticipativity constraints. Both, the primal and the dual subproblem 1 Interestingly, only 49 results are found by a Google Scholar search on "cross-decomposition" + "stochastic programming" as of January 18th 2014. Thereof, only two publications seem to actively combine Benders and Lagrangean decomposition in a somewhat similar way to our proposed scheme. 2 Although, the importance of a good formulation should not be neglected. See Bixby and Rothberg [6] and Lima and Grossmann [33] for further details and examples. are solved in parallel. The primal search is guided by a multi-cut Benders master problem, which is augmented with optimality cuts obtained from the Lagrangean dual subproblems. The dual search is guided by a disaggregated version of the Lagrangean dual master problem, which is bounded with cuts obtained in previously solved Benders primal subproblems. The main motivation in both cases is to strengthen the bounds predicted by both master problems.

Problem statement
We consider a two-stage stochastic programming problem (SP) of the following form (with column vectors): The objective function (1) minimizes the total expected cost T C, which consists of investment cost (c T x) and expected operational cost ( s∈S τ s d T s y s ), where τ s is the probability for scenario s ∈ S with s∈S τ s = 1. The first-stage decisions, x, which are of dimension m are mixed-integer and correspond to discrete choices for investments and associated capacities: All second-stage decisions, y s , which correspond to operational decisions in scenario s, are continuous. In Eq. (2), constraints on the investment decisions are specified, e.g. logic constraints on investment combinations. Equation (3) links the investments with operational decisions, e.g. if capacity is expanded, additional operational decisions are available. In Eq. (4), operational constraints are specified.
Note that the problem naturally decomposes into scenarios once the investment decisions x are fixed. We can explicitly formulate so-called non-anticipativity constraints, which are derived by duplicating the investment decisions for each scenario (x s ) and enforcing equality constraints across all scenarios. We use the formulation by Caroe and Schultz [9] and re-write problem (SP) in the following way (SPNAC), where Eq. (12) represents the non-anticipativity constraints, (x 1 = x 2 = · · · = x |S| ), with suitable matrices (H 1 , . . . , H |S| ) where each H s has dimension (|S| − 1)mxm: y s ≥ 0 ∀s ∈ S (14) In (13), X s is defined analogously to X:

Ingredients of the decomposition algorithm
In the following, we describe the ingredients for our cross-decomposition scheme that is based on Benders and Lagrangean decomposition, which both exploit the decomposable problem structure, while being strengthened with additional cuts that are generated from each other. For clarity of the presentation, we assume complete recourse, which means that all scenarios are feasible, independent of first-stage decisions. This assumption can be relaxed, but requires an adequate handling of infeasible primal subproblems in the decomposition algorithms as per including primal feasibility cuts and addressing dual unboundedness.

Benders (primal) subproblem
Once the first-stage decisions x are fixed, each scenario s can be solved individually.
In order to facilitate full primal-dual integration, it is important that we assign the investment cost in the Benders primal subproblems in the same way we assign the investment cost in the Lagrangean dual subproblem. Therefore, we build our Benders primal subproblems from (SPNAC) but remove the explicit non-anticipativity constraint. If we refer to a given vector of investment decisionsx k in iteration k, we assume that it is feasible according to constraints (9) and (12). The Benders subproblem (B S P k ) based on (SPNAC), can be written in two ways for each scenario s: a primal formulation (B S Pp k s ) and a dual formulation (B S Pd k s ), which are equivalent since (B S P k s ) is an LP that can be solved with zero duality gap. We present both in the following.
Let u s be the dual multiplier associated with constraint (17) and v s be the dual multiplier of constraint (18). Then, the dual of (B S Pp k s ) can be written as: Letŷ k s be the optimal solution of (B S Pp k s ) and (u k s , v k s ) be the solution of (B S Pd k s ). Let us define z * k P,s , in the following way (equivalence due to strong duality): we obtain a valid upper bound on (SPNAC)'s objective function value T C since (B S P k ) is a restriction of (SPNAC):

Lagrangean (dual) subproblem
A valid relaxation of (SPNAC) can be obtained by formulating the Lagrangean dual for (SPNAC), in which the non-anticipativity constraints (12) are dualized in order to apply the framework of Lagrangean decomposition [9,20]. The Lagrangean Dual (L D k ) is formulated in the following with given fixed Lagrange multipliers μ k for convenience defined as row vectors each scenario s, denoted as (L D k s ).
be the optimal solution of (L D k s ). Let us define z * k L D,s in the following way: With z * k L D = s∈S z * k L D,s , we obtain a valid lower bound on (SPNAC)'s objective function value T C since (L D k ) is a relaxation of (SPNAC):

Lagrangean (primal) subproblem
The previously described Lagrangean dual subproblem yields a solution, in which some of the original non-anticipativity constraints (12) are most likely violated. Therefore, a primal solution needs to be obtained, which also provides a valid upper bound.
In the framework of Lagrangean decomposition, "some heuristic" needs to be applied to generate a first-stage feasiblex k+1 , which is used in (SPNAC) in order to obtain the primal solution. We would like to highlight that the Lagrangean primal subproblem for scenario s is equivalent to the Benders subproblem for scenario s in its primal form (B S Pp s ).

Benders (primal) master problem (update of first-stage variables)
Dual optimality cuts from Benders primal subproblems Using the dual solution we can derive two types of dual optimality cuts for (SPNAC). The optimality cuts are based on weak duality, which means that the value of any dual feasible solution of the subproblems (B S Pd k s ) (across all scenarios s) is a lower bound on the optimal value of the primal (B S Pp k s ) and (u k s , v k s ) stays feasible in (B S Pd k s ) even whenx k is replaced by another x. The first cut (33) is the well-known Benders cut [3] T The second cut, presented in Eqs. (34)-(35) is the Benders multi-cut [4], where θ s is the objective function value for scenario s.
While the multi-cut version (34)-(35) provides faster convergence due to increased strength of the dual information, it increases the problem size compared to the singlecut version (33) (see [45]). Note that we do not provide feasibility cuts since we assume complete recourse, as previously stated.
Lagrangean dual bounds for the Benders primal master problem We notice that the Lagrangean dual (L D s ) is a relaxation of the primal problem for scenario s. Hence, we can derive a valid lower bound for θ s based on the previously obtained information from the Lagrangean dual: The proof that (36) is a valid inequality for the Benders primal master problem can be found in Appendix "Derivation of (36) as valid inequality for (BMP)" No-Good Cut If the complicating variables, x, are all binary variables, we can also formulate a no-good cut [1] to exclude previously visited solutions from the solution space: In Eq. (37), L k 0 and L k 1 are defined as follows (where x l is the lth component of x): Formulation of the Benders (primal) master problem Using the multi-cut version (34)-(35) for the optimality cuts and the Lagrangean dual bounds (36) presented in (43) below, we write the Benders master problem (B M P k+1 ) that yields the next primal vectorx k+1 as follows: If X has only integer variables, we can add the no-good cut (37) with the sets L k 0 and L k 1 defined by (38)- (39) to (B M P k+1 ). Let z * k+1 B M P be the optimal objective function value of (B M P k+1 ). Note that (B M P k+1 ) is a relaxation of (SPNAC) that provides a lower bound to (SPNAC): Note that by adding Eq. (43) to the Benders master problem, we strengthen its formulation, and we can guarantee that the lower bound obtained from the Benders master problem is at least as tight as the best known solution from the Lagrangean dual since

Lagrangean (dual) master problem (multiplier update)
Once the Lagrangean dual subproblems are solved, a new solution for the Lagrangean multipliers μ needs to be generated for the next iteration of the algorithm based on the information obtained in the subproblems. The most commonly used method to update the Lagrangean multipliers is the subgradient method [13,23,24]. Unfortunately, the convergence of the subgradient method is not very reliable, especially when dealing with large-scale problems. One alternative to the subgradient algorithm is using cutting planes, similar to the ones used in the Benders master problem. We present in the following the Lagrangean dual master problem, which is also referred to as "cutting plane method" and can be formally derived from the Dantzig-Wolfe primal master problem [15,19].
Typical cutting plane The typical form of the cutting plane [11,30] is as follows: in which (x k s ,ỹ k s ) is the solution of the kth Lagrangean dual subproblem of scenario s (L D k s ).
Disaggregated version of the cutting plane (multi-cut) In many publications that employ a cutting plane approach for Lagrangean decomposition, the cutting plane as described in Eq. (48) is used. However, it is also possible to derive a disaggregated version of the cutting plane for the case in which the Lagrangean relaxation decomposes into a set of independent subproblems [15,19], in which κ s is the variable associated with the objective function value in scenario s: The disaggregated version of the cutting plane is analogous to the multi-cut version of the Benders cut, which seems intuitive since Benders decomposition and Lagrangean relaxation can be seen as duals of each other. It can also be shown that (50) is an inner approximation of the convex hull of the non-complicating constraints for scenario s. Therefore, we obtain a tighter approximation of the convex hull of all non-complicating constraints by intersecting these approximations (50) across all scenarios s instead of building the approximation of the convex hull for the entire set of non-complicating constraints [15]. Note also that similarly as for the Benders multi-cut, there might be a tradeoff between tightness of the cuts and problem size.
Benders primal bounds for the Lagrangean master problem Analogously to (36), it is further possible to derive additional bounds from the Benders primal subproblems for the objective function value κ s of each scenario s in the Lagrangean master problem since we will use the disaggregated version of the Lagrangean master problem: The proof that (51) is a valid inequality for the Lagrangean dual master problem is given in Appendix "Derivation of (51) as valid inequality for (LMP)"

Formulation of the Lagrangean Dual Master Problem
Using the multi-cut version (49)-(50) of the cutting planes and the upper bounds from the Benders primal subproblem (51) presented in (55) below, the Lagrangean dual master problem (L M P k+1 ), which yields the next Lagrangean multipliers μ k+1 , can be formulated as: where |S| is the number of scenarios and m the number of dualized first-stage variables. The objective function (52) contains the additional quadratic stabilization term δ 2 μ − μ 2 2 that defines a trust-region for the update of the Lagrangean multipliers [14,31,32,46]. The stabilization requires initial values and update strategies for the penalty δ, which defines the size of the trust region, and the stabilization centerμ. For a detailed overview on the stabilization techniques, we refer to the previously mentioned work.
Let z * k+1 L M P be the optimal objective function value of (L M P k+1 ) without the stabilization term. Note that the valid inequalities (55) make the Lagrangean master problem bounded. Furthermore, they guarantee that the Lagrangean master problem will yield a bound at least as tight as the best known primal upper bound, obtained from the Benders subproblems:

Initialization scheme
In Benders decomposition as well as in Lagrangean decomposition, an initial guess for the first-stage primal variables/Lagrangean multipliers needs to be provided. The solution of (SPNAC)'s LP relaxation is usually hard due to a large number of scenarios. We notice that a good guess might speed up convergence and can potentially be derived based on in-depth problem knowledge. A generic initialization is the following one: In general, this initialization is reasonable since it corresponds to making no investments (as a baseline guess) and assuming that the value transferred to each scenario by making an investment is zero. Note that this initialization will be primal feasible since the problem has complete recourse. Furthermore, initial values for the trust-region penalty δ and for the stabilization centerμ need to be specified. Typically,μ is set to μ k=0 .

The proposed algorithm
The proposed algorithm, which is illustrated in Fig. 1, consist of the following steps:

Comments on the primal and dual bounds
One of the potential strengths of the proposed algorithm is that it guarantees to provide a lower bound, which is at least as tight as the best lower bound obtained from the Lagrangean dual, while having a mechanism to generate first-stage solutions that are feasible regarding the first-stage constraints (in contrast to using a heuristic as in Lagrangean decomposition). Note that primal-dual information exchange in terms of bounds for each scenario s is facilitated by the fact that both subproblems, the Benders primal and the Lagrangean dual, have the same investment cost terms for each scenario s. Furthermore, we use a multi-cut approach for both, the primal Benders and dual Lagrangean master problem to derive primal and dual bounds for each scenario s. While we expect tight bounds, the computational performance needs to be carefully investigated for each case individually. We anticipate to see an impact for cases in which the Benders algorithm converges slowly due to a weak underlying linear relaxation. At the same time, the effort that an additional Lagrangean iteration adds to the total computational time needs to be compared with the number of saved Benders iterations and associated computational time.

Comments on the original cross-decomposition
In the original cross-decomposition algorithm [25,43] the master problems are seen as a fall-back option in case one of the convergence tests fails. Otherwise, the solution of the master problems is avoided as much as possible. In contrast, our algorithm solves both master problems at each iteration rather than avoiding them. Note that the convergence tests used in the original cross-decomposition scheme [25,43] are no longer needed; convergence towards optimality is guaranteed as proved in Appendix "Proof of convergence for the enhanced cross-decomposition with primal dual multi-cuts". Additionally, Van Roy [43] and Holmberg [25] do not consider the case of stochastic programming and its decomposable structure as well as associated difficulties in the direct exchange of primal and dual variables between subproblems, which can for instance also be found in Lagrangean Decomposition. Furthermore, in the original cross-decomposition algorithm no strengthening techniques such as multi-cuts or valid inequalities [i.e. valid inequalitites (36) and (57)] are introduced to the respective master problems (since they are regarded as hard to solve)-the latter ones are a novel contribution of this paper.

Facility location with disruptions
In order to test the proposed algorithm, we apply the cross-decomposition scheme to a number of instances of a facility location problem with distribution centers (DCs) under the risk of disruptions. In the problem, demand points need to be satisfied from a set of candidate DCs in order to minimize the sum of investment cost and expected transportation cost. The capacitated reliable facility location problem (CRFLP) is formulated as a two-stage stochastic program. First-stage decisions involve the selection of DCs and their capacities. Second-stage decisions are the demand assignments in scenarios that establish the combination of active and disrupted locations. Penalties are applied to unsatisfied demands, such that the second-stage subproblems have full recourse.
Two versions of the model are presented in "Appendix 2": CRFLP and CRFLP-t. The second model (CRFLP-t) includes a redundant set of constraints that tightens its linear relaxation. The equations of model CRFLP and CRFLP-t have the same structure as the generic ones described in formulation (1)- (6). Due to its size, the problem is hard to solve with a fullspace approach and requires decomposition techniques to enable an efficient solution. Furthermore, the problem is numerically challenging because the scenarios with several simultaneous disruptions have very small probabilities.

Description of implementation
We implement the fullspace model, the classical multi-cut Benders decomposition [4] and our cross-decomposition algorithm, as previously described in Sect. 4, in GAMS 24.3.1 [7].
For the decomposition schemes, we employ parallel computing in two different ways: first, Benders and Lagrangean subproblems are solved in parallel as groups of 50 scenarios using the GAMS grid computing capabilities [8] on the 12 processors of an Intel Xeon (2.67 GHz) machine with 16 GB RAM. We choose to solve the subproblems as groups of scenarios to reduce the overhead in data transfer. Second, the Lagrangean and Benders master problems are solved by allowing GUROBI 5.6.3 to use the processors as parallel threads. For the fullspace implementations, we also use GUROBI with parallel threads.
To cope with the numerical difficulties that arise due to small scenario probabilities, we use additional solver settings of GUROBI 5.6.3. We set quad 1 (for quad precision) and numericfocus 3 (for high attention to numerical issues) in all master problems and subproblems. The optimality tolerances for reduced costs are set to 10 −7 . We use barhomogeneous 1 to detect infeasibilities with the barrier method employed to solve the QP Lagrangean master problems. Instances are considered solved when their relative optimality gap is below 10 −4 ; the MILPs inside the Benders and cross-decomposition loops are solved using an optimality tolerance of 10 −6 . We set a wall-clock time limit of 10,000 s for the solution of each instance.
For the Lagrangean master problem, the multipliers are bounded below and above according to their interpretation: the maximum penalties that can be incurred in a scenario. Additionally, the Lagrange multipliers are scaled with their corresponding scenario probabilities in order to maintain the same order of magnitude among the variables of the problem.
The update strategies for the penalty term δ (initial value δ = 1) and for the stability centerμ are rather simple for our illustrative case study. In each iteration, the trust-region parameter δ is updated according to the following rule: δ k+1 = max{ 1 2 δ k , 10 −10 }. The initial stability centerμ = 0 is never updated. Note that more elaborate update schemes, as described by Kiwiel [31] and Frangioni [14], could additionally improve the strength of the Lagrangean cuts, and hence the performance of the overall cross-decomposition algorithm.

Computational results
In the following, we compare the three methods (fullspace model, multi-cut Benders decomposition, and cross-decomposition) with 10 random instances generated for three cases with different number of candidate DCs and scenarios. Both problem formulations (CRFLP and CRFLP-t) presented in "Appendix 2" are tested in order to analyze the impact of the LP relaxation in the performance of the solution methods. The instances are generated based on the data set presented by Snyder and Daskin [40] by sampling transportation costs and demands from uniform distributions bounded between 80 and 120 % of the original values. In particular, we use random transportation costs to include different cost structures in the objective function and random demands to analyze different shapes of the feasible region; the evaluation of the solution methods on a large number of instances with different values for key parameters allows establishing their variability in performance. The details of the methodology used to generate the random instances can be found in "Appendix 2".

Comparison of methods
In Table 1, we report the computational statistics of the fullspace models for the three cases. It can be observed that an increasing number of DCs implies more scenarios, constraints, and variables. The mean of the optimal solutions for the 10 instances together with the mean of their LP relaxation gap is also presented in Table 1. The statistics show a significant increase in the number of constraints for the CRFLP-t formulations as a result of the addition of tightening constraints; these constraints are also responsible for the improvement in the LP relaxation. It is important to note that despite the size of the formulations, the number of discrete variables remains small because there is only one binary variable per candidate DC. The performance of the solution methods on the instances are presented in Figs. 2, 3, and 4. These performance curves show the percentage of instances that can be solved to optimality within the time limit shown in the horizontal axis. A complete report of the results obtained from the computational experiments can be found in the Supplementary Material. Figure 2 compares the wall-clock time required by the methods to solve the instances with 639 scenarios. It can be observed that cross-decomposition on the CRFLP-t formulation outperforms all other methods by solving all instances in less than 800 s. The fullspace CRFLP formulation also solves all instances in less than 800 s, but it takes more than 600 s for most of them. Benders decomposition on the CRFLP-t formulation is very efficient for some instances (5 instances require less than 500 s), but it takes a long time to solve others. Benders decomposition on the CRFLP formulation does not solve any instance within the time limit in the case with 639 scenarios.
Similar trends can be observed in Fig. 3 for the performance of the solution methods on the case with 1025 scenarios. In this case, cross-decomposition on the CRFLP-t formulation clearly outperforms all methods by solving all instances in less than 3250 s. Cross-decomposition on the CRFLP formulation also presents a relatively good performance by solving 7 instances in less than 3000 and all instances in less than 5200 s. The fullspace CRFLP formulation solves all instances in less than 5100 s, but it requires over 4000 s for most of them. Poor performance is observed for the fullspace CRFLP-t formulation and Benders decomposition on both CRFLP and CRFLP-t formulations.   For the case with 1587 scenarios, Fig. 4 shows that cross-decomposition is the only solution method capable of solving some instances. In this case, cross-decomposition on the CRFLP formulation solves 3 instances and cross-decomposition on the CRFLPt formulation solves 8 instances within the time limit. This results are in line with the trends observed in the previous cases: the cross-decomposition method outperforms all other methods in large-scale problems.
The mean solution time over the 10 instances is presented for each method in Table 2. It can be observed that cross-decomposition on the CRFLP-t formulation is on average the fastest method for all cases. It is interesting to note that Benders  decomposition on the CRFLP-t formulation shows good results for the instances with 639 scenarios but not for the instances with 1025 and 1587 scenarios. There is also a progressive deterioration of the performance of the fullspace CRFLP formulation from the case with 639 scenarios to the case with 1587 scenarios. This performance confirms that cross-decomposition is the best alternative for large-scale problems. The computational experiments also show that both Benders and cross-decomposition are affected by the underlying LP relaxation of the models. However, if we compare both decomposition strategies on the CRFLP and CRFLP-t formulations, it is clear that the performance of the cross-decomposition method is less affected by the weakness of the formulation. Hence, we conclude that cross-decomposition is less sensitive to weak formulations due to the presence of strong cuts that originate from the Lagrangean dual. The time spent by the decomposition methods is used to solve subproblems and master problems. For both implementations (CRFLP and CFRLP-t) of Benders decomposition, the wall-clock time spent in the Benders master problems is over 90 % of the total time. We also conducted a separate analysis to estimate the impact of parallelization for the Benders and Lagrangean subproblems. As a result of parallelization, the time spent in Benders decomposition is significantly reduced: the speed-up for solving the Benders subproblems of the CRFLP formulation in 12 parallel threads is around 8.5 in the case with 639 scenarios and around 7.0 for the Benders subproblems of the CRFLP-t formulation with 1587 scenarios. A similar situation occurs in the implementation of the cross-decomposition method. Most of the wall-clock time is spent in solving the Benders master problems: on average, around 58 % of the time in the case with 639 scenarios and over 75 % in the case with 1587 scenarios. The Lagrangean master problems also require a considerable amount of time: approximately 24 % of the total time in the case with 639 scenarios and 12 % in the case with 1587 scenarios. There is also a significant decrease in the solution time required to solve the MILP Lagrangean subproblems as a result of parallelization: the speed-up is approximately 5.9 for the Lagrangean subproblems of the CRFLP formulation with 639 scenarios and 6.3 for the Lagrangean subproblems of the CRFLP-t formulation with 1587 scenarios. The result suggests that parallelization of the subproblems yields better speed-ups with LP subproblems that can be solved efficiently by individual threads.

Conclusion
We have described a cross-decomposition algorithm that combines Benders and Lagrangean decomposition for two-stage stochastic MILP problems with complete recourse, where the first-stage variables are mixed-integer and the second-stage variables are continuous. The algorithm fully integrates primal and dual information with multi-cuts that are added to the Benders and the Lagrangean master problems for each scenario. Computational results for an illustrative case study on a facility location problem under the risk of disruptions show evidence of the conceptual strength of the cross-decomposition such as a reduction of iterations and stronger lower bounds compared to pure multi-cut Benders decomposition. While the computational times per iteration increases due to the solution of the Lagrangean dual subproblem and master problem, cross-decomposition seems to be especially advantageous compared to Benders decomposition if the underlying LP relaxation is weak, as suggested by the facility location problem. Aside from the unique integration of primal-dual multicuts in the master problems, the proposed cross-decomposition scheme is also unique compared to previous work given that it has been developed for solving two-stage stochastic programming problems.
Proof Let z * k L D,s = τ s (c Tx k s + d Tỹk s ) + μ k H sx k s be the solution of the Lagrangean dual for scenario s (L D k s ), for a given Lagrangean multiplier μ k , k ∈ K . We can then formulate the following inequality using the definition of the objective function of (L D k s ): Since (L D k s ) is a relaxation for scenario s of (SPNAC), the following inequality holds (a proof can be found in [29]), where x s is replaced by the original x, which is defined as per (SPNAC).
The inequality (61) obviously also holds for the minimization over y s [subject to (2), (5), (17)- (19)]: which can be reformulated as a maximization over (u s , v s ) using strong duality [subject to (2), (5), (21)- (22)]: (63) Note that (62) and (63) have exactly the same structure as the Benders subproblems (B S Pp k s ) and (B S Pd k s ) respectively for a given x with the exception of the constant Lagrangean term μ k H s x. Therefore, for a given x = x k that coincides with the input to the k th Benders subproblem, (63) has the same optimal solution as the k th Benders subproblem. With (u k s , v k s ) as the optimal solution in the k th Benders subproblem, we can rewrite (63) using weak duality (in case of variation in x, as in the classical Benders cut) as: which can be reformulated in the following way by applying (35): which proves that (36) is a valid inequality.

Proposition 2 Equation (51) is a valid inequality for the Lagrangean dual master problem (LMP).
Proof Using Minkowski's Theorem, we can express the primal variables (x, y) T in the following way using the disaggregated expression in terms of scenarios s: with k∈Kᾱ k s = 1, ∀s ∈ S, 0 ≤ᾱ k s ≤ 1, where (x k s ,ȳ k s ) are all the extreme points of the convex hull of the set of non-complicating constraints Z s of (SPNAC), which is assumed to be compact, and include all but the non-anticipativity constraints: While not all of the extreme points are known, we can consider a subset K ⊂K , which can be constructed according to previous solutions from the Lagrangean dual (x k s ,ỹ k s ) [15]. The set of solutions from the Lagrangean dual is augmented with |K | previous solutions from the Benders primal problem (x k ,ŷ k s ), which are also feasible in Z s but are not necessarily extreme points of Z s . Therefore, (66) can be rewritten as: The restricted Dantzig-Wolfe primal master problem yields an upper bound on the Dantzig-Wolfe primal master problem. With the dual variables (μ, κ s ), which are both unrestricted, the Lagrangean dual master problem can be written as: from which we can extract Eq. (76) as valid inequalities since we derived the Lagrangean master problem (LMP) in (74)-(76) (without quadratic trust-region stabilization term).

Proposition 3
The algorithm converges to the optimal solution of (SPNAC) within ε-tolerance in a finite number of steps.
Proof The algorithm relies on Benders decomposition with multi-cuts to obtain lower and upper bounds. The Benders primal master problem is modified by adding Eq. (36). However, by proposition 1, Eq. (36) is a valid inequality that does not cut off the optimal solution. Therefore, we will find the optimal solution within ε-tolerance in a finite number of steps according to the convergence proof by Birge and Louveaux [4].
with N candidate locations, the probability (τ s ) of one scenario with n simultaneous disruptions can be calculated according to Eq. (77).
The following notation is used in the two-stage stochastic programming formulation. The set of candidate locations for DCs is denoted by J , including a fictitious DC with unlimited capacity in position |J | (note that |J | = N + 1); the set of demand sites is denoted by I ; the set of scenarios is denoted by S. The decision whether DC at candidate location j is selected is represented by the binary variable x j ; the fraction of demand i satisfied from location j in scenario s is denoted by y s, j,i ; the capacity of DC at location j is denoted by c j . The parameters of the problem are: the fixed-charge for the selection of DCs j (F j ), the unit capacity cost for DC j (V j ), the demand at site i (D i ), the unit transportation cost from DC j to site i (A j,i ), the probability of scenario s (τ s ), the maximum capacity of DCs (C max ), and the matrix indicating the availability of DC j in scenario s (T s, j ).

Model
According to Garcia-Herreros et al. [16], the capacitated reliable facility location problem (CRFLP) is formulated as follows. The objective function (78) minimizes the sum of investment cost and expected transportation cost. Constraints (79) ensure that capacity is only allocated to selected DCs. Constraints (80) limit demand satisfaction to the inventory availability according to the binary parameter T s, j that indicates the disrupted locations in each scenario. Constraints (81) enforce demands satisfaction or penalization in every scenario. The domain of the variables is presented in constraints (82) and (83).
The formulation (CRFLP) is know to have a poor linear relaxation. In order to strengthen the MILP formulation, a redundant set of constraints can be added to the model. The set of tightening constraints (86) directly prevent demand assignments to DCs that are not selected or disrupted.

Data
The data for the instances is taken from Daskin [12]. The original problem considers 49 US cities that simultaneously serve as demand sites and potential distribution centers (DCs). The formulation includes uncapacitated DCs with investment costs estimated from the real-state market. Variable costs associated with the DC capacities have been added at a rate of $0.0001 per unit of product. Originally, demands for a single commodity are assumed to be proportional to the state populations in 1990, and transportation costs are proportional to the great-circle distance between facilities. The original demands and transportation cost are used to generate the first instance; all other instances are generated randomly by sampling demands and transportation costs from uniform distributions bounded between 80 and 120 % of the original values.
Given the very large number of possible scenarios (2 49 ), we have selected subsets of 10, 11, and 12 facilities as candidate locations for DCs in the different instances of the problem. The locations included in the smallest instance are: Sacramento (CA), Albany (NY), Austin (TX), Tallahassee (FL), Harrisburg (PA), Springfield (IL), Columbus (OH), Montgomery (AL), Salem (OR), and Des Moines (IA). The additional location included in the instance with 11 candidate DCs is Lansing (MI). The largest instance with 12 candidate DCs also includes Trenton (NJ).
Given the very small probability of scenarios with more than 5 simultaneous disruptions, they have been grouped into a single scenario in which all demands are penalized. The effect of this approximation is limited by the magnitude of the corresponding probabilities (<1.2×10 −5 in all cases). Furthermore, the approximation improves the numerical performance of the algorithm. Despite the reduction in size, the problem still implies minimizing the cost over large sets of scenarios. The failure probability of each DC has been left to the value originally used by Snyder and Daskin [40]: q = 0.05.