Logic-Based Benders Decomposition ∗

Benders decomposition uses a strategy of “learning from one’s mistakes.” The aim of this paper is to extend this strategy to a much larger class of problems. The key is to generalize the linear programming dual used in the classical method to an “inference dual.” Solution of the inference dual takes the form of a logical deduction that yields Benders cuts. The dual is therefore very diﬀerent from other generalized duals that have been proposed. The approach is illustrated by working out the details for propositional satisﬁability and 0-1 programming problems. Computational tests are carried out for the latter, but the most promising contribution of logic-based Benders may be to provide a framework for combining optimization and constraint programming methods.

Benders decomposition [7,17] uses a problem-solving strategy that can be generalized to a larger context. It assigns some of the variables trial values and finds the best solution consistent with these values. In the process it learns something about the quality of other trial solutions. It uses this information to reduce the number of solutions it must enumerate to find an optimal solution. The strategy might described as "learning from one's mistakes." The central element of Benders decomposition is the derivation of Benders cuts that exclude superfluous solutions. Classical Benders cuts are formulated by solving the dual of the subproblem that remains when the trial values are fixed. The subproblem must therefore be one for which dual multiplers are defined, such as a linear or nonlinear programming problem.
The key to generalizing Benders decomposition is to extend the class of problems for which a suitable dual can be formulated. We depart from previous generalizations of duality by defining an inference dual for any optimization problem. The solution of the inference dual is a proof of optimality within an appropriate logical formalism. Generalized Benders cuts are obtained by determining under what conditions the proof remains valid. Classical Benders decomposition can be seen to be a special case of this approach if it is viewed in a different light than the usual.
Logic-based Benders decomposition can be applied to any class of optimization problems, but a proof scheme and a method of generating Benders cuts must be devised for each class. We illustrate the method by applying it to propositional satisfiability, 0-1 programming problems, and a machine scheduling problem. The subproblems in these cases are not the traditional linear or nonlinear programming problems. One can therefore take advantage of special structure in the subproblems that is inaccessible to the traditional Benders method.
For example, a satisfiability or 0-1 programming problem may decouple into several smaller problems when certain variables are fixed. The subproblem can therefore be solved rapidly by solving its individual components, even though the components are themselves general satisfiability or 0-1 problems.
We address the satisfiability problem because it illustrates logic-based Benders in a lucid way and is an important problem in its own right. We examine the 0-1 programming problem because of the attention it has historically received. None of this should suggest, however, that logic-based Benders is applicable only to these problem classes.
For instance, Jain and Grossmann [31] recently solved machine schedul-ing problems using a technique that is in effect logic-based Benders decoposition. Because the subproblems are one-machine scheduling problems, classical Benders cuts are unavailable. Jain and Grossmann achieved dramatic speedups in computation by solving the subproblems with constraint technology. This experience suggests that logic-based Benders can provide a natural medium for combining optimization and constraint programming. This idea is discussed in [27,29]. The first section below reviews related work. Section 2 presents the basic idea of logic-based Benders decomposition. Section 3 introduces inference duality, and Section 4 shows how linear programming duality is a special case. The next two sections present logic-based Benders decomposition in the abstract, followed by its classical realization. Sections 7 and 8 apply logic-based Benders to the propositional satisfiability problem and 0-1 programming, respectively. Section 9 presents computational results for 0-1 programming, and Section 10 describes Jain and Grossmann's work. The final section is reserved for concluding remarks.

Related Work
Various types of generalized duality have been proposed over the years. Tind and Wolsey provided a survey in 1981 [43]. Much of this and subsequent work [2,3,8,9,13,48,49] is related to the superadditive duality of Johnson [35]. In 1981 Wolsey [50] used such a notion of duality as the basis for a generalized Benders algorithm in which the classical dual prices are replaced by a price function. Several other duals have been suggested for integer programming [4,6,15,16,40,41]. A recent paper of Williams [47] examines a wide variety of duality concepts. Still more recently, Lagrangean and surrogate duals are interpreted in [27,29] as forms of a relaxation dual.
The inference dual proposed here is fundamentally different from these earlier duals, because it regards the dual as an inference problem. Its solution is in general a proof, rather than a set of prices or a price function. The resulting generalization of Benders decomposition is therefore unlike Wolsey's.
The idea of inference duality might be traced ultimately to Jeroslow and Wang [33]. They showed that when linear programming demonstrates the unsatisfiability of a set of Horn clauses in propositional logic, the dual solution contains information about a unit resolution proof of unsatisfiability. (Unit resolution is defined in Section 7 below.) This introduces the key idea that the dual solution can be seen as encoding (or partially encoding) a proof. It does not, however, show how to generalize the idea beyond a linear programming context, and a subsequent generalization focused on another type of linear programming problem, namely gain-free Leontief flows [32].
Hooker and Yan [30] introduced the logic-based Benders scheme described here, or a special case of it, in the context of logic circuit verification, and they presented computational results. Their method is that of Section 5 below specialized to logic circuits. After the first draft of the present paper was written (1995), Hooker [25] proposed inference duality as a basis for postoptimality analysis, and Dawande and Hooker [14] specialized the approach to sensitivity analysis for mixed integer programming. These ideas are presented in [27] as part of a general theoretical framework.

The Basic Idea
As already noted, Benders decomposition learns from its mistakes. A similar idea has been developed in a general way in the constraint programming literature under the rubric of nogoods ( [45], Section 5.4; [34]). When in the process of solving a problem one deduces that a certain partial solution cannot be completed to obtain a feasible solution, one can examine the reasons for this. Often the same reasons lead to a constraint that excludes a number of partial solutions. Such a constraint is a nogood.
Benders decomposition uses a more specific strategy. It begins by partitioning the variables of a problem into two vectors x and y. It fixes y to a trial value so as to define a subproblem that contains only x. If the solution of the subproblem reveals that the trial value of y is unacceptable, the solution of the subproblem's dual is used to identify a number of other values of y that are likewise unacceptable. The next trial value must be one that has not been excluded. Eventually only acceptable values remain, and if all goes well, the algorithm terminates after enumerating only a few of the possible values of y. (The method is presented in more detail below.) Because Benders decomposition searches for an optimal as well as a feasible solution, it actually enumerates trial values of (z, y), where z is the objective function value. Its "nogoods" have the form z ≥ β(y), where β(y) is a bound on the optimal value that depends on y. The constraints z ≥ β(y) are known as Benders cuts and can rule out a large number of values of (z, y).
The specialized context of Benders decomposition enhances the general strategy in two ways.
• The pre-arranged partition of variables can exploit problem structure. When y is fixed, the resulting subproblem may simplify substantially or decouple into a number of small subproblems.
• The linearity of the subproblem allows one to obtain a Benders cut in an easy and systematic way, namely by solving the linear programming dual.
The intent here is to generalize a Benders-like strategy while retaining these two advantages to a large degree. The key to doing so is to generalize the notion of a dual. The dual must be definable for any type of subproblem, not just linear ones, and must provide an appropriate bound on the optimal value. Such a dual can be formulated simply by observing that a valid bound β on the optimal value is obtained by inferring it from the constraints. A generalized dual can therefore defined as an inference dual, which is the problem of inferring a strongest possible bound from the constraint set. The classical linear programming dual is the inference dual problem for linear optimization. A solution of the inference dual takes the form of a proof that β is in fact a bound on the optimal value.
In the context of Benders decomposition, the proof that solves the subproblem dual provides a valid bound β on the assumption that y is fixed to some particular value. But the same reasoning may deliver valid bounds when y takes other values. A constraint that imposes these bounds as a function of y becomes the logic-based Benders cut. It plays the same role as the classical Benders cut, although it may not take the form of an inequality constraint.

Inference Duality
Consider a general optimization problem, where f is a real-valued function. The domain D is distinguished from the feasible set S. The domain might be the set of real vectors, 0-1 vectors, etc.
The feasible set S is generally defined by a collection of constraints, and it need not be a subset of D.
To state the inference dual it is necessary to define a semantic form of implication with respect to a domain. Let P and Q be two propositions whose truth or falsehood is a function of x. Then P implies Q with respect to D (notated P The dual seeks the largest β for which f (x) ≥ β is implied by the constraint set. In other words, the dual problem is to find a tighest possible bound on the objective function value. It is convenient to let the optimal value of a minimization problem be respectively ∞ or −∞ when the problem is infeasible or unbounded, and vice-versa for a maximization problem. A strong duality property obviously holds for the inference dual: an optimization problem (1) always has the same optimal value as its inference dual (2).
Although the inference dual is very simple and satisfies a trivial form of strong duality, it suggests a different way of thinking about duality. It leads to a new generalized Benders algorithm as well as the new approach to sensitivity analysis mentioned earlier [14,25]. It also suggests a different interpretation of the nontrivial strong duality theorems that appear in the optimization literature. Each such theorem identifies a method of proof for a particular logic and asserts that the method is sound and complete. (A proof method is sound when it obtains only valid inferences, and it is complete when it obtains all valid inferences.) Classical linear programming duality is an example of this.

Linear Programming Duality
The inference dual of a linear programming problem, min cx Linear inequalities over R n can be regarded as forming a type of logic in which the inequalities are propositions, and nonnegative linear combinations provide a sound and complete proof method. More precisely, a feasible system Ax ≥ a, x ≥ 0 semantically implies cx ≥ β with respect to R n if and only if there is a real vector u ≥ 0 for which uA ≤ c and ua ≥ β; i.e., if and only if the surrogate inequality uAx ≥ ua dominates cx ≥ β for some u ≥ 0. Another way to state this is that Ax ≥ a, x ≥ 0 implies cx ≥ β if and only if the classical dual problem has a solution u for which ua ≥ β: The classical dual therefore has the same optimal value as the inference dual (4) when Ax ≥ a, x ≥ 0 is feasible. This leads immediately (because of strong inference duality) to the classical strong duality theorem for linear programming: (5) has the same optimal value as (3), unless both are infeasible. Classical strong duality is therefore a way of stating the completeness of nonnegative linear combination as a proof method for the logic of linear inequalities. A feasible solution u of the classical dual can be interpreted as encoding a proof within this logic.

Benders Decomposition in the Abstract
Benders decomposition views elements of the feasible set as pairs (x, y) of objects that belong respectively to domains D x , D y . So the optimization problem (1) becomes, A general Benders algorithm begins by fixing y at some trial valueȳ ∈ D y . This results in the subproblem, The inference dual of the subproblem is max β The dual problem is to find the best possible lower bound β * on the optimal cost that can be inferred from the constraints, assuming y is fixed toȳ. The heart of Benders decomposition is somehow to derive a function βȳ(y) that gives a valid lower bound on the optimal value of (6) for any fixed value of y.
Just how this is done varies from one context to another, but essentially the idea is this. Let β * be the value obtained for (8), which means that β * is a lower bound on the optimal value of (6), given that y =ȳ. The solution of (8) is a proof of this fact. One then notes what valid lower bound this same line of argument yields for other values of y. This bound is expressed as a function βȳ(y) of y, yielding a Benders cut z ≥ βȳ(y) The subscriptȳ reflects which value of y gave rise to the bounding function.
The algorithm proceeds as follows. At each iteration the Benders cuts so far generated comprise the constraints of a master problem, Here y 1 , . . ., y K are the trial values of y hitherto obtained. The next trial value (z,ȳ) of (z, y) is obtained by solving the master problem. If the optimal value β * of the resulting subproblem dual (8) is equal toz, the algorithm terminates with optimal valuez. Otherwise a new Benders cut z ≥ βȳ(y) is added to the master problem. The subproblem dual need not be solved to optimality if a suboptimal β is found that is better thanz. At this point the procedure repeats.
A precise statement of the algorithm appears in Fig. 1. Note that if the subproblem is infeasible, the dual is unbounded, so that β * = βȳ(ȳ) = ∞.
Choose an initialȳ ∈ D y in problem (6). Setz = −∞ and k = 0. While the subproblem dual (8) has a feasible solution β >z: Formulate a lower bound function βȳ(y) with βȳ(ȳ) = β. Let k = k + 1, let y k =ȳ, and add the Benders cut z ≥ β y k (y) to the master problem (9). If the master problem (9) is infeasible then Letȳ be an optimal solution of (9) with optimal valuez. The optimal value of (6) isz. Theorem 1 Suppose that in each iteration of the generic Benders algorithm, the bounding function βȳ satisfies the following.
Then if the algorithm terminates with a finite optimal solution (z, y) = (z,ȳ) in the master problem, (6) has an optimal solution (x, y) = (x,ȳ) with value f (x,ȳ) =z. If it terminates with an infeasible master problem, then (6) is infeasible. If it terminates with an infeasible subproblem dual, then (6) is unbounded.
Proof. Suppose first that the algorithm terminates with a finite solution (z, y) = (z,ȳ) that is optimal in the master problem. Let β * be the optimal value of the last subproblem dual (8) solved. Then the subproblem primal (7) has optimal value β * and some optimal solutionx. Clearly β * is an upper bound on the optimal value of (6). But because the algorithm terminated with a finite solution, β * =z. Finally, due to (B1),z is a lower bound on optimal value of (6). It follows thatz is the optimal value of (6). Because (x,ȳ) is feasible in (6), it is also optimal.
Because all Benders cuts are valid, infeasibility of the master problem implies that (6) is infeasible.
Finally, if the subproblem dual is infeasible, then the subproblem is unbounded. This implies that (6) is unbounded. 2 The algorithm need not terminate in general, even if the subproblem is always solved to optimality. Consider for example the problem, which has the optimal solution (x, y) = (1, 1). It is consistent with the algorithm to set Then initially (z,ȳ) = (0, 0), and the solutions y k of the master problem follow the sequence 1 − ( 1 2 ) k for k = 1, 2, . . ., so that the algorithm never terminates.
If the domain of y is finite, however, the algorithm must terminate, because only finitely many subproblems can be defined. Becausez must increase in every iteration but the last, the optimal value is reached after finitely many steps.
Theorem 2 Suppose that in each iteration of the generic Benders algorithm, the bounding function βȳ satisfies (B1) and the following. Then if D y is finite and the subproblem dual is solved to optimality, the generic Benders algorithm terminates.
Proof. Due to (B2), the subproblem value β must increase in every iteration but the last. Because D y is finite and the subproblem dual is solved to optimality, the latter can generate only finitely many values. So the algorithm must terminate. 2

Classical Benders Decomposition
The classical Benders technique applies to problems in which the subproblem is linear.
min cx + f (y) where g(y) is a vector of functions g i (y). The subproblem (7) becomes, Due to classical strong duality, the subproblem dual (8) can be written, provided either (11) or (12) is feasible. If the dual (12) has a finite solution u, then u provides an inference procedure (a linear combination) for obtaining a valid bound on the objective function value z of (10): that is valid when y =ȳ. The key to obtaining a bound for any y is to observe that this same u remains feasible in the dual for any y. So the same procedure (i.e., the same u) provides a bound z ≥ βȳ(y) = u(a − g(y)) + f (ȳ) for any y. This is the classical Benders cut. It is straightforward to show that when the dual is infeasible or unbounded, one can obtain a cut v(a − g(y)) ≤ 0, where v is a ray that solves The propositional satisfiability problem is to find an assignment of truth values to logical variables (atomic propositions) x j for which a given set of logical clauses are all true. A clause is a disjunction of literals, each of which is an atomic proposition or its negation. For instance, x 1 ∨ ¬x 2 ∨ ¬x 3 is a clause asserting that x 1 is true or x 2 is false or x 3 is false. One clause C implies another D if and only of C absorbs D; i.e., C contains all the literals in D.
A logical clause may be denoted C i (x), where x = (x 1 , . . . , x n ) is a vector of logical variables, and the variables appearing in C i (x) belong to {x 1 , . . ., x n }. The satisfiability problem can be formulated as an optimization problem by giving it an objective function of zero. min 0 The optimal value of (13) is zero if the clauses C i (x) are jointly satisfiable, and otherwise it is infinite (by convention). The inference dual is, where T and F denote true and false, respectively. When the clauses are satisfiable, the maximum value of β is zero. Otherwise the antecedent of the implication is necessarily false and implies 0 ≥ β for arbitrarily large β (because a necessarily false proposition implies everything). In this case the dual problem is unbounded. Benders decomposition is applied when the propositional variables are partitioned (x, y). The satisfiability problem (13) becomes min 0 C i (x) represents the part of clause i containing variables in x, and D i (y) the part containing variables in y. Either part may be empty. In practice the variables would be partitioned so that the subproblem has special structure; this is discussed further below.
For a fixed assignmentȳ of truth values to y, the subproblem is, min 0 where is already satisfied and need not appear in the subproblem. The subproblem dual is, As an example, consider the satisfiability problem, min 0 Initially the master problem (9) contains no constraints, and the solution y may be chosen arbitrarily. Ifȳ = (F, F, F ), clauses (a)-(f) appear in the subproblem, with the y j terms removed. The subproblem is therefore A branching method known as the Davis-Putnam-Loveland algorithm appears to be among the most effective complete algorithms for solving a satisfiability problem such as this (e.g., [12]). Figure 2 shows a straightforward branching tree that finds (18) to be unsatisfiable. The Davis-Putnam-Loveland algorithm adds a "unit resolution" step that will be discussed later. The procedure in Figure 2 is simple. At the root node, the search branches on x 1 by setting x 1 to true and false. This generates two subproblems, corresponding to left and right child nodes respectively. Clauses (a) and (b) can be dropped in the left subproblem because x 1 = T satisfies them. Also ¬x 1 is deleted from clause (f) because it cannot be true. The right subproblem is similarly simplified.
The search continues in recursive fashion until a feasible solution is found or the subproblem at every leaf node is shown to be unsatisfiable. A feasible solution is found when the variables fixed so far at some node of the search tree satisfy every clause in the subproblem at that node. (A clause is satisfied when at least one literal in it is fixed to true.) A subproblem is shown to be unsatisfiable at a node when the variables fixed so far violate at least one clause of the subproblem (i.e., make every literal of the clause false).
If a feasible solution is found, the Benders algorithm immediately terminates with a feasible solution (x, y) = (x,ȳ), wherex j is the truth value to which x j is fixed at the node where a feasible solution is found. If x j is not fixed,x j may be chosen arbitrarily.
If no feasible solution is found, a Benders cut z ≥ βȳ(y) must be generated. This requires solution of the dual subproblem (17). Because the subproblem has an infinite optimal value, the dual solution must consist of a proof that 0 ≥ β for arbitrarily large β, using the clauses {C i (x) | i ∈Ī(ȳ)} as premises. In other words, it must show that This can be done by applying a complete inference method for logical clauses, such as the resolution method discovered by W. V. Quine over 40 years ago [37,38] (known as consensus when applied to formulas in disjunctive normal form). The name 'resolution' derives from Robinson [39], who developed the method for first-order predicate logic. However, this tends to be a very inefficient approach [18,19].
It is more practical to observe that the implicit enumeration tree of Figure 2 already provides the desired proof of unsatisfiability, because at least one clause is falsified at every leaf node. In fact, if one associates a single falsified clause with each leaf node, the tree proves the unsatisfability of the subset of clauses associated with leaf nodes. (It is shown in [27] that this proof by enumeration is formally equivalent to a resolution proof, but this fact is not needed here.) In the figure, it happens that only one clause is falsified at each leaf node, except for a node at which both (c) and (f) are falsified; suppose that (c) is associated with this node. Then the tree proves that clauses (a), (b), (c), (d) and (e) are jointly unsatisfiable. Let us denote the index set of these clauses byĪ(ȳ), which is a subset of I(ȳ). Thus the dual solution uses only a subset of {C i (x) | i ∈ I(ȳ)} as premises. This can result in a stronger Benders cut.
To find a Benders cut, it is necessary to identify all values of y for which this particular proof remains valid. This is easily done by observing which values of y falsify D i (y) for i ∈Ī(ȳ). So long as y falsifies D i (y), the clause C i (x) remains in the subproblem and is therefore falsified at its associated nodes of the search tree in Figure 2. One can therefore write a Benders cut z ≥ βȳ(y) that states z ≥ ∞ when all clauses D i (y) for i ∈Ī(ȳ) are false and states z ≥ 0 otherwise. In the example, the cut states z ≥ ∞ when y falsifies the following clauses.
It will be useful in the next section to use the following notation. If P is a proposition, define a function v(P ) to take the value 1 when P is true and 0 otherwise. Then the desired Benders cut can be written where ∧ denotes a conjunction. Thus when all D i (y) for i ∈Ī(ȳ) are false, z ≥ ∞. Otherwise z ≥ ∞ · 0 = 0. In the example, the cut is z ≥ ∞ · v (¬(y 1 ∨ y 2 ) ∧ ¬y 1 ∧ ¬y 2 ) This is equivalent to z ≥ ∞· v (¬y 1 ∧ ¬y 2 ), but it is not necessary to identify such reductions. In iteration K the master problem (9) is min z In practice the master problem would be solved as a satisfiability problem in which each Benders cut (19) is written as a clause i∈Ī(ȳ) Clearly (21) is satisfied if and only if (19) does not force z ≥ ∞. Thus (20) becomes i∈Ī(y k ) D i (y), k = 1, . . ., K Set k = 0.
While the master problem (22)   In the example, (20) becomes the satisfiability problem which is of course equivalent to y 1 ∨ y 2 . One solution of the master problem (23) is (ȳ 1 ,ȳ 2 ,ȳ 3 ) = (F, T, F ), for which the resulting subproblem is The problem can be solved by setting x 1 = x 2 = T . Thus the algorithm terminates with solution (x 1 , x 2 , x 3 , y 1 , y 2 , y 3 ) = (T, T, x 3 , F, T, F ), where x 3 can be true or false.
The Benders algorithm for satisfiability appears in Figure 3. Because D y is finite, Theorems 1 and 2 imply that the algorithm terminates with the correct answer if the Benders cuts have properties (B1) and (B2). It is easy to verify that they do.
The same approach can be used if the Davis-Putnam-Loveland algorithm is applied to the subproblems. This algorithm is simply the branching algorithm described earlier, except that unit resolution is applied at each node of the search tree. That is, if one of the clauses is a unit clause (contains exactly one literal), that literal is fixed to true. This in turn fixes the variable that occurs in the literal and allows the problem to be simplified. The process continues until no unit clauses remain.
Fixing a variable x j to true or false in any step of unit resolution is equivalent to branching on x j . Branching creates one subproblem in which a unit clause is violated, and a second subproblem that simplifies. The Davis-Putnam-Loveland algorithm is therefore formally equivalent to a pure branching algorithm, and Benders cuts can be generated accordingly.
There are various ways to accelerate the Benders algorithm. For example, it is advantageous to choose the same violated clause C i(t) (x) for several leaf nodes t. This makes the set {C i(t) (x) | all t} smaller and results in a stronger Benders cut.
Also, as noted earlier, the variables should be partitioned so that (a) the master problem is small and (b) the subproblem decouples into small problems or has some other special structure. For instance, a subproblem in which the clauses are renamable Horn can be checked for satisfiability in linear time [1,10]. Finding a small subset of variables y for which the subproblem is always renamable Horn is a maximum clique problem [11], which can be solved by various heuristics and exact algorithms.
In practice the master problem as well as the subproblem might be solved by Davis-Putnam-Loveland. Whenever a feasible solution is found, the subproblem is solved, and the Benders cut (if any) is added to the master problem. At this point the master search tree can be updated to reflect the additional clause; an efficient algorithm for doing this is presented in [22]. Note that in the context of a branching algorithm, Benders cuts have a role complementary to that of traditional cutting planes. Whereas the latter contain variables that have not yet been fixed, Benders cuts contain variables that have already been fixed. They are also valid throughout the search tree, as are nogoods in general.
Interestingly, when the subproblem is renamable Horn, it is equivalent to a linear programming problem [33], so that the original problem could be solved by traditional Benders decomposition. But the linear-time methods for solving the subproblem are much faster than methods for solving the linear programming equivalent.
Furthermore, it is shown in [30] that if the variables y j represent inputs to a logic circuit, and the variables x j represent the outputs of gates in the circuit, then the problem of checking whether the circuit represents a tautology can be solved by Benders decomposition, where the subproblem is renamable Horn. Here again the subproblem is equivalent to a linear programming problem, and the specialized Benders algorithm developed in [30] yields Benders cuts that are in fact equivalent to those obtained by classical Benders-but again much more rapidly than in the classical case.

0-1 Programming
Benders decomposition has long been applied to 0-1 programming. The difference here is that the subproblem is a 0-1 programming problem, rather than a linear programming problem as in the classical case.
Because branching algorithms for integer programming problems usually solve a continuous relaxation of the problem at each node, this feature must be incorporated into the dual solution. In effect, the classical linear programming dual will be combined with a branch-and-bound search to obtain a dual solution.
A 0-1 programming problem may be stated, The constant c 0 will be useful shortly. The inference dual is max β If β * is the optimal value of (24), a solution of the dual (25) consists of a proof that cx + c 0 ≥ β * , using Ax ≥ a as premises.
If the variables are partitioned (x, y), a 0-1 problem can be written, min cx + dy For a fixedȳ the subproblem is, The subproblem dual is, The subproblem (27) can be solved by branch and bound to obtain its optimal value β * . Solution of the dual (28), however, is necessary to obtain a Benders cut. Its solution requires that one exhibit a proof of cx + dȳ ≥ β * using Ax ≥ a − Bȳ as premises. One approach is to use a complete inference method for 0-1 linear inequalities, such as the generalized resolution method of [21]. Such a method in fact serves as the basis for a logic-based solution method for 0-1 programming [5].
A more direct approach, however, is to interpret the branch-and-bound tree for the primal problem (27) as a proof of optimality. As in the satisfiability problem, one can associate a violated inequality I t with each leaf node t of the tree. I t implies a logical clause C t that is falsified at that node. The optimality proof remains valid for any y such that I t continues to imply C t . This in turn provides the basis for a Benders cut.
This may be made precise as follows. At leaf node t, let J 1 be the set of all j for which branching has set x j to 1, and similarly for J 0 . Then the following clause C t is violated by the fixed variables.
j∈J 0 The continuous relaxation of the problem at leaf node t may be written, The system Hx ≥ h contains upper bounds of the form −x j ≥ −1 as well as constraints −x j ≥ 0 that fix x j to 0 for j ∈ J 0 , and constraints x j ≥ 1 that fix x j to 1 for j ∈ J 1 . Dual variables u, v may be associated with the constraints as shown. We will need the following lemma, whose proof is straightforward. Let α + = max{α, 0}.
When the relaxation (30) is solved at node t, there are three possible outcomes.
(a) The relaxation is infeasible. In this case the dual solution (u, v) is such that Thus the following system is infeasible.
This means that the fixed variables violate the surrogate inequality In fact, the fixed variables violate any inequality that implies C t . They therefore violate uAx ≥ ua − uBy for any y such that this inequality implies C t . By Lemma 3, uAx ≥ ua − uBy implies C t if and only if where A j is column j of A. This can be written Let this be inequality I t .
(b) The relaxation is feasible and the solution is integral. Then if the optimal value is β * t , the dual solution satisfies Thus the following system is infeasible.
This means that the fixed variables violate the surrogate inequality They also violate (uA − c)x > β * t − ub + uBy for any y such that this inequality implies C t . Using Lemma 3, C t is implied if and only if Let this be inequality I t .
(c) The relaxation is feasible and has a nonintegral solution. Its optimal valueβ t is greater than or equal to the value of the incumbent solution.
The analysis is the same as in (b) withβ t replacing β * t . Thus the fixed variables again violate the surrogate (37) and the inequality I t is The optimal value β * of the subproblem is the minimum over all values β * t obtained at leaf nodes t. The dual solution exhibits, at every leaf node, a surrogate inequality (31) or (33) that is (a) violated, (b) violated when the value of the relaxation is assumed to be less than β * t , or (c) violated when the value of the relaxation is assumed to be less thanβ t . Given any y for which each leaf node's surrogate remains violated, the branch-and-bound tree proves optimality of β * . But such a y is precisely one that satisfies the inequalities I t for each t. A Benders cut may therefore be stated, where T is the set of leaf nodes. The term j c − j simply computes the smallest possible value of cx and therefore provides a default bound when y does not satisfy all of the I t 's. A somewhat sharper analysis is possible when the subproblem (27) decouples into smaller problems. Suppose then that (27) can be written where the vectors x i have no variables in common. The subproblem is solved by solving for i = 1, . . . , p to obtain optimal solutionsx i and optimal values (β * ) i . An optimal solution of (37) can now be writtenx = (x 1 , . . . ,x p ), and the optimal value is β * = p i=1 (β * ) i + dȳ. Let I i t be the inequality I t obtained at leaf node t of the branch-andbound tree that solves (38). Then the Benders cut (36) becomes where T i is the set of leaf nodes of the search tree that obtained (β * ) i . Also J i is the set of indices of variables in x i . Consider the following example. min 4x 1 + 2x 2 + 5x 3 + x 4 + y 1 + 20y 2 s.t. 2x 1 + x 2 + 2y 1 + y 2 ≥ 5 (a) Initially the master problem has no constraints, and one may choose the optimal solution (y 1 , y 2 ) = (0, 0). Settingȳ = (0, 0) yields a subproblem of the form (37) that decouples into two smaller problems: The first subproblem has the following relaxation at the root node of a branch-and-bound tree.
The relaxation is infeasible, with dual solution u = 1, v = (2, 1). The root node is therefore the only leaf node, at which the violated surrogate (31) is 2x 1 + x 2 ≥ 5. The Inequality I 1 1 , given by (32), is 2y 1 + y 2 < 2. The second problem in (41) has a fractional solution (x 3 , x 4 ) = ( 2 3 , 1) at the root node, and the search therefore branches on x 3 . In the branch defined by x 3 = 0, the problem is infeasible and generates the inequality I 2 1 given by y 2 < 2. Because this inequality is necessarily satisfied, it can be replaced by the tautologous inequality 0 < 1. The branch defined by x 3 = 1 has the fractional solution (x 3 , x 4 ) = (1, 1 2 ), which requires a further branch on x 4 . This creates an infeasible node and a node at which the relaxation has an integral solution (x 3 , x 4 ) = (1, 1).
The complete algorithm is summarized in Table 1. After the first Benders cut is added to the master problem, the solution of the latter isȳ = (1, 1). This generates a second Benders cut, and the master problem that contains the first two Benders cuts has solutionȳ = (1, 0). This generates a third Benders cut, which when added to the master problem yieldsȳ = (1, 1) with optimal value 11. The subproblem resulting fromȳ = (1, 1) has already been solved, with optimal value 11. The algorithm therefore terminates with solution (x, y) = (0, 1, 0, 1, 1, 1).

Computational Testing
The task of testing Benders decomposition computationally is somewhat problematic, since it is not a general-purpose method. It is intended for problems with special structure, and its performance depends on the degree to which decomposition can exploit the structure.
One approach to testing is to exhibit an important application area in which problems are structured so that Benders is more effective than other known techniques. One such application area is reported in the next section.
Another approach is to focus on a particular type of structure that a Benders method can exploit. The simplest case occurs when a subproblem separates into smaller problems. We investigated to what extent the subproblem of a 0-1 programming problem must decouple before a Benders approach becomes advantageous.
We randomly generated instances in which the subproblem decouples into m problems of size s × s, and the master problem has s variables. The subproblem has no other special structure. Thus each A i in (37) is s × s, each B i is s × s, and the original problem is ms × (m + 1)s. A larger m Choose an initialȳ ∈ {0, 1} n . Setz = −∞ and k = 0. While the subproblem (27) has optimal value β * >z: For each leaf t of the search tree used to solve (27): If the relaxation (30) is infeasible then Let I t be (32). Else if (30) has an integral optimal solution then Let I t be (34). Else if (30) has a nonintegral optimal solution then Let I t be (35). Add the Benders cut (36) to the master problem (9). Let k = k + 1 and y k =ȳ.
If the master problem (9) is infeasible then Stop; (26) is infeasible. Else letȳ be an optimal solution of (9) with optimal valuez. (x, y, z) = (x,ȳ,z) is an optimal solution of (26), where x =x is an optimal solution of the last subproblem solved. implies a more highly decoupled subproblem, whereas m = 1 implies no decoupling at all. The Benders approach requires that when the subproblem is solved, the dual solution be available at each leaf node of the search tree. Because commercial software available to us did not provide this information, we solved the subproblems with a straightforward branch-and-bound algorithm written in Java, in which CPLEX solved the linear programming relaxations.
We applied exactly the same branch-and-bound algorithm to the original problem (without decomposition) and compared the computation time to that of the Benders algorithm. This provided a controlled experiment in which the effect of decomposition could be isolated, a general approach advocated in [24,26].
Because the master problem does not have the traditional inequality constraints, we solved it with a modified branch-and-bound algorithm. We branched on y j 's with fractional values as well as on the possible values of the right-hand side of each Benders cut (39).
While the subproblem (27) has optimal value β * >z: For each component (37) of the subproblem (27): For each leaf t of the search tree used to solve (37): If the relaxation (30) is infeasible then Let I i t be (32).

Else if (30) has an integral optimal solution then
Let I i t be (34). Else if (30) has a nonintegral optimal solution then Let I i t be (35). Add the Benders cut (39) to the master problem (9). Let k = k + 1 and y k =ȳ.
If the master problem (9) is infeasible then Stop; (26) is infeasible. Else letȳ be an optimal solution of (9) with optimal valuez. (x, y, z) = (x,ȳ,z) is an optimal solution of (26), where x =x is an optimal solution of the last subproblem solved. master problem (9) in the form For each Benders cut k, we enforced the following p disjunctions.
To keep the problem small, we did not actually use the variables z ik in (42). Rather, we replaced each z ik with (β * ) i − j∈J i c − j if the first disjunct of (43) was enforced and with zero otherwise. Thus the linear relaxation of the master problem minimizes z subject to the following inequality constraints.
(a) The Benders cuts (42), with each z ik replaced by (β * ) i − j∈J i c − j or zero.
(b) For each Benders cut and each i = 1, . . . , p, the inequality constraint ¬I i t if the disjunct (z ik = 0) ∧ ¬I i t of (43) is enforced. (c) Constraints y j ≤ 0 or y j ≥ 1 for each y j that has been fixed to 0 or 1 (respectively) by prior branching.
(d) The bounds 0 ≤ y j ≤ 1 for each y j .
The inequality ¬I i t is written f ≥ g if I i t has the form f < g, and it is written f ≥ g + if I i t has the form f ≤ g. At each node of the branch-and-bound tree, we branched according to the following rules.
(i) If the the solution of the linear relaxation just described is noninteger, branch on a y j with value closest to 1/2.
(ii) Otherwise, branch on one of the remaining disjunctions (43) in which the current values of the y j 's satisfy none of the last |T i | disjuncts, and remove this disjunction from the problem. Enforce one of the |T i | + 1 disjuncts of (43) in each branch.
(iii) If no such disjunctions remain, the current solution is feasible and no branching is necessary. The crossover point m * of the two curves is the value of m at which the Benders approach becomes superior to standard branch and bound. (For s = 4 we were unable to solve problems with m > 25.) The crossover point is rather large, with m * equal to 20 or more. However, once the Benders approach becomes superior, its superiority rapidly becomes ovewhelming. With s = 3, for example, the Benders approach is on a par with the traditional approach for m = 21 but is already an order of magnitude faster for m = 25.

An Application to Machine Scheduling
A particularly interesting role for logic-based Benders decomposition is as a framework for combining optimization and constraint programming, as proposed in [29]. Jain and Grossmann [31] illustrate how this might be done in their solution of a machine scheduling problem. Thorsteinsson and Hooker [42] use a similar framework to solve vehicle routing problems with time windows. In the Jain and Grossmann application, the master problem assigns jobs to machines, and the subproblem tries to schedule jobs on their assigned machine. The master problem is given a mixed integer programming model, while the subproblem is attacked with the highly-developed scheduling technology of constraint programming (in particular, the ILOG Scheduler as invoked by OPL Studio [46]).
The problem may be stated as follows. Each job j is assigned to one of several machines i that operate at different speeds. Each assignment results in a processing time D ij and incurs a processing cost C ij . There is a release date R j and a due date S j for each job j. The objective is to minimize processing cost while observing release and due dates. Let t j be the time at which job j begins processing, and let y ij be 1 if job j is assigned to machine i and 0 otherwise. For a given machine i let ((t j , D ij ) | y ij = 1) be the list of pairs (t j , D ij ) for which job j is assigned to machine i. The model is The nonoverlap constraint requires that t j +D jk ≤ t k for all jobs j, k assigned to a given machine i.
The master problem contains variables t j , y ij . To create a subproblem, introduce discrete variables t j that have the same meaning as t j . The subproblem contains the release times and deadlines along with the nonoverlap constraint. So if the assignments are fixed toȳ, the subproblem becomes the following scheduling problem: min 0 The problem separates into a feasibility problem for each machine i: The nonoverlap constraint can implemented by standard "global" constraints available in constraint programming systems. In the ILOG scheduler, it is implemented by the cumulative constraint. The variables y j are discrete because the cumulative constraint is designed for variables with finite domains. Let I k be the set of machines i for which (44) is infeasible in the k-th iteration of the Benders algorithm, and let J ik = {j |ȳ ij = 1} be the set of jobs assigned to machine i. For each i ∈ I k , infeasibility implies that the jobs in J ki cannot all be scheduled on machine i. This gives rise to a Benders cut j∈J ik (1 − y ij ) ≥ 1 for each i ∈ I k . Going beyond Jain and Grossmann, one can strengthen the cut by identifying a proper subset J ik of jobs assigned to machine i that cannot feasibly be scheduled on that machine. The proof of infeasibility obtained by the constraint programming algorithm may reveal such a smaller set.
The Benders cuts go into the master problem, where k ranges over all the Benders iterations carried out so far. The algorithm terminates in the first iteration k for which I k is empty. Using this approach, Jain and Grossmann obtained substantial speedups relative to constraint programming and mixed integer programming.

Conclusion
An elementary theory of logic-based Benders decomposition has been developed and applied to three problems: the propositional satisfiability problem, the 0-1 programming problem, and a machine scheduling problem. The last illustrates how Benders decomposition can provide a principle for combining optimization with constraint programming.
A logic-based Benders approach can in principle be applied to any optimization or feasibility problem, because inference duality is defined for any such problem. Its success depends on the extent to which • an easily solved subproblem can be obtained by a judicious partitioning of the variables, and • the solution of the subproblem dual with y fixed to a particular value is a proof whose line of reasoning yields a useful lower bound (Benders cut) for other values of y.
Logic-based Benders decomposition may also have potential in a branchand-cut context, as mentioned in connection with the satisfiability problem above. Cuts presently used typically involve variables that have not yet been fixed in the enumeration tree and serve primarily to strengthen a linear or some other relaxation. Logic-based Benders cuts, by contrast, would involve variables that are already fixed and would apply throughout the tree. They would prune the tree in a way that is unrelated to any relaxation. Table 1: Solution of a 0-1 problem by logic-based Benders decomposition.