Duality in Optimization and Constraint Satisfaction

. We show that various duals that occur in optimization and constraint satisfaction can be classiﬁed as inference duals, relaxation duals, or both. We discuss linear programming, surrogate, Lagrangean, superadditive, and constraint duals, as well as duals deﬁned by resolution and ﬁltering algorithms. Inference duals give rise to nogood-based search methods and sensitivity analysis, while relaxation duals provide bounds. This analysis shows that duals may be more closely related than they appear, as are surrogate and Lagrangean duals. It also reveals common structure between solution methods, such as Benders decomposition and Davis-Putnam-Loveland methods with clause learning. It provides a framework for devising new duals and solution methods, such as generalizations of mini-bucket elimination.


Two Kinds of Duals
Duality is perennial theme in optimization and constraint satisfaction.Wellknown optimization duals include the linear programming (LP), Lagrangean, surrogate, and superadditive duals.The constraint satisfaction literature discusses constraint duals as well as search methods that are closely related to duality.
These many duals can be viewed as falling into two classes: inference duals and relaxation duals [12].The two classes represent quite different concepts of duality.This is perhaps not obvious at first because the traditional optimization duals just mentioned can be interpreted as both inference and relaxation duals.
Classifying duals as inference or relaxation duals reveals relationships that might not otherwise be noticed.For instance, the surrogate and Lagrangean duals do not seem closely related, but by viewing them as inference duals rather than relaxation duals, one sees that they are identical except for a slight alteration in the type of inference on which they are based.
A general analysis of duality can also unify some existing solution methods and suggest new ones.Inference duals underlie a number of nogood-based search methods and techniques for sensitivity analysis.For instance, Benders decomposition and Davis-Putnam-Loveland methods with clause learning, which appear unrelated, are nogood-based search methods that result from two particular inference duals.Since any inference method defines an inference dual, one can in principle devise a great variety inference duals and investigate the nogoodbased search methods that result.For example, filtering algorithms can be seen as inference methods that define duals and give rise to new search methods, such as decomposition methods for planning and scheduling.
Relaxation duals underlie a variety of solution methods that are based on bounding the objective function.A relaxation dual solves a class of problem relaxations that are parameterized by "dual variables," in order to obtain a tight bound on the objective function value.The LP, surrogate, Lagrangean, and superadditive duals familiar to the optimization literature are relaxation duals.A constraint dual is not precisely a relaxation dual but immediately gives rise to one that generalizes mini-bucket elimination methods.
Inference and relaxation duals are precise expressions of two general problemsolving strategies.Problems are often solved by a combination of search and inference; that is, by searching over values of variables, which can yield a certificate of feasibility for the original ("primal") problem, and by simultaneously drawing inferences from constraints, which can yield a certificate of optimality by solving the dual problem.A problem belongs to NP when the primal solution has polynomial size and to co-NP when the dual solution has polynomial size.
Problems can also be solved by a combination of search and relaxation; that is, by enumerating relaxations and solving each.The relaxation dual is one way of doing this, since it searches over values of dual variables and solves the relaxation corresponding to each value.

Inference Duality
An optimization problem can be written where f(x) is a real-valued function, C is a constraint set containing variables x = (x 1 , . . ., x n ), and D is the domain of x.A solution x ∈ D is feasible when it satisfies C and is optimal when f(x) ≤ f(x) for all feasible x.If there is no feasible solution, the optimal value of (1) is ∞.If there is no lower bound on f(x) for feasible x, the problem is unbounded and the optimal value is −∞. 1 A value z is a feasible value of (1) if f(x) = z for some feasible x, or if z = ∞, or if z = −∞ and the problem is unbounded.
A constraint satisfaction problem can be viewed the problem of determining whether the optimal value of min x∈D {0 | C} is 0 or ∞.

The Inference Dual
The inference dual of ( 1) is the problem of finding the greatest lower bound on f(x) that can be inferred from C within a given proof system.The inference dual can be written max where The domain of variable P is a family P of proofs.A pair (v, P ) is a feasible solution of (2) if P ∈ P and C P (f(x) ≥ v), and it is optimal if v ≥ v for all feasible (v, P ).If f(x) ≥ v cannot be derived from C for any finite v, the problem is infeasible and has optimal value −∞.If for any v there is a feasible (v, P ), the problem is unbounded and has optimal value ∞.A value v is a feasible value of (2) if (v, P ) is feasible for some P ∈ P, The original problem (1) is often called the primal problem.Any feasible value of the dual problem is clearly a lower bound on any feasible value of the primal problem, a property known as weak duality.The difference between the optimal value of the primal and the optimal value of the dual is the duality gap.
The constraint set C implies f(x) ≥ v when f(x) ≥ v for all x ∈ D satisfying C. The proof family P is complete if for any v such that C implies f(x) ≥ v, there is a proof P ∈ P that deduces f(x) ≥ v from C. If P is complete, then there is no duality gap.This property is known as strong duality.
Solution of the inference dual for a complete proof family P solves the optimization problem (1), in the sense that a solution (v, P ) of the dual provides a proof that v is the optimal value of (1).If P always has polynomial size, then the dual belongs to NP and the primal problem belongs to co-NP.Solution of the inference dual for an incomplete proof family may not solve (1) but may be useful nonetheless, for instance by providing nogoods and sensitivity analysis.

Nogood-based Search
Nogoods are often used to exclude portions of the search space that have already been explicitly or implicitly examined.The inference dual can provide a basis for a nogood-based search.
Suppose we are solving problem (1) by searching over values of x in some manner.The search might proceed by splitting domains, fixing variables, or by adding constraints of some other sort.Let B be the set of constraints that have been added so far in the search.The constraint set has thus been enlarged to C ∪ B. The inference dual of this restricted problem is max If (v, P ) solves the dual, we identify a subset N of constraints that include all the constraints actually used as premises in the proof P .That is, P remains a valid proof when C ∪ B is replaced by N .Then by weak duality we can infer the nogood This nogood is a valid constraint and can be added to C, which may accelerate the search.For instance, if N contains only a few variables, then restricting or fixing only a few variables may violate the nogood, allowing us to avoid a dead end earlier in the search process.An important special case of this idea identifies a subset B ⊂ B of the search constraints that preserves the validity of P .That is, P remains a proof when C ∪ B replaces C ∪ B. Then we can use the nogood as a side constraint that guides the search, rather than adding it to C. Suppose for example that the search proceeds by splitting domains; that is, by adding bounds of the form L j ≤ x j ≤ U j to B. Suppose further than at some point in the search we obtain a solution (v, P) of the inference dual and find that the only bounds used as premises in P are L j ≤ x j and x k ≤ U k .Then we can write the nogood To obtain a solution value better than v, we must avoid all future branches in which x j < L j and x k > U k .We can equally well apply this technique when we branch by fixing a variable x j to each of the values in its domain.Suppose that at some point in the search the variables in x F have been fixed to values xF , and the variables in x U remain unfixed, where x = (x F , x U ). Thus B = {x F = xF }.We obtain a solution (v, P ) of the inference dual and identify a subset x J of variables in x F such that P is still valid when x F = xF is replaced by x J = xJ .The resulting nogood tells us that if we want a solution value better than v, the remainder of the search should exclude solutions x in which x J = xJ .

Sensitivity Analysis
Sensitivity analysis determines the sensitivity of the optimal value of (1) to perturbations in the problem data.Suppose that we have solved (1) and found its optimal value to be z * .A simple form of sensitivity analysis relies on an optimal solution (v, P ) of the inference dual [11].Let C be a subset of C for which P remains a valid proof of f(x) ≥ v. Then changing or removing the premises in C \ C has no effect on the bound and therfore cannot reduce the optimal value of (1) below v.If there is no duality gap, then z * = v, and changing or removing these constraints has no effect on the optimal value of (1).
A sharper analysis can often be obtained by observing how much the individual constraints in C can be altered without invalidating the proof P .One can also observe whether a proof having the same form as P would deduce f(x) ≥ v for some v < v when the constraints in C are altered in certain ways.Both of these strategies have long been used in linear programming, for example.They can be applied to integer and mixed integer programming as well [6].
From here out we focus on nogood-based search rather than sensitivity analysis.

Linear Programming Dual
A linear programming (LP) problem has the form The inference dual is max The proofs in family P are based on nonnegative linear combination and domination.Let a surrogate of a system Ax ≥ b be any linear combination uAx ≥ ub, where u ≥ 0. An inequality ax ≥ α dominates bx ≥ β when a ≤ b and α ≥ β.
There is a proof The proof P is encoded by the vector u of dual multipliers.Due to the classical Farkas lemma, the proof family P is complete, which means that strong duality holds.The inference dual (5) of ( 4) is essentially the same as the classical LP dual of (4).A solution (v, P ) is feasible in the dual problem (5) when some surrogate uAx ≥ ub dominates cx ≥ v, which is to say uAx ≤ c and ub ≥ v.So when the dual is bounded (i.e., the primal is feasible), it can be seen as maximizing v subject to uAx ≤ c, ub ≥ v, and u ≥ 0, or equivalently max u≥0 {ub |uA ≤ c} (6) which is the classical LP dual.Strong duality holds for the classical dual unless both the primal and dual are infeasible.
When the LP dual is used in nogood-based search, the well-known method of Benders decomposition results [2].It is applied to problems that become linear when certain variables x F are fixed: Suppose that when x F is fixed to xF , (7) has optimal value z and optimal dual solution ū.By strong duality v is the largest possible lower bound on the optimal value of (7) when x F = xF .But since ū remains dual feasible when xF in ( 7) is replaced by any x F , weak duality implies that (8) remains a valid lower bound for any x F .This yields the nogood where z represents the objective function of (7).This nogood is known as a Benders cut.If the dual of ( 7) is unbounded, there is a direction or ray ū along which its solution value can increase indefinitely.In this case the Benders cut (9) simplifies to ū(b − g(x F )) ≤ 0.
In the Benders algorithm, the set x F of fixed variables is static.The algorithm searches over values of x F by solving a master problem in each iteration of the search.The master problem minimizes z subject to the Benders cuts obtained in previous iterations.The optimal solution of the master problem becomes the next xF .The search terminates when the optimal value of the master problem is equal to the previous z.The master problem can be solved by any desired method, such as branch and bound if it is mixed integer programming problem.

Surrogate Dual
The surrogate dual results when one writes the inference dual of an inequalityconstrained problem, again using nonnegative linear combination and domination as an inference method.When the inequalities and objective function are linear, the surrogate dual becomes the linear programming dual.When a slightly stronger form of domination is used, we obtain the Lagrangean dual, as is shown in the next section.
The surrogate dual [10] is defined for a problem of the form where g(x) is a vector of functions.A surrogate of g(x) ≤ 0 is any linear combination ug(x) ≤ 0 with u ≥ 0. Let P ∈ P deduce f(x) ≥ v from g(x) ≤ 0 when some surrogate ug(x) ≤ 0 dominates f(x) ≥ v.We will use the weakest possible form of domination: ug(x) ≤ 0 dominates f(x) ≥ v whenever the former implies the latter.This family P of proofs is generally incomplete.Under this definition of P, the inference dual of (10) finds the largest v such that ug(x) ≤ 0 implies f(x) ≥ v for some u ≥ 0. The inference dual therefore becomes the surrogate dual A difficulty with the surrogate dual is that it is generally hard to solve.Yet if the problem (10) has special structure that allows easy solution of the dual, the resulting nogoods could be used in a search algorithm.

Lagrangean Dual
Like the surrogate dual, the Lagrangean dual is defined for inequality-constrained problems of the form (10). Again the proofs in P consist of nonnegative linear combination and domination, but this time a stronger form of domination is used.
In the surrogate dual, ug(x) ≤ 0 dominates f(x) ≥ v, which can be written Under this definition of P, the inference dual of (10) finds the largest v such that ug(x) ≥ v − f(x) for some u ≥ 0. Since ug(x) ≥ v − f(x) can be written f(x) + ug(x) ≥ v, the inference dual becomes the Lagrangean dual The Lagrangean dual has the nice property that the optimal value θ(u) of the minimization problem in ( 12) is a concave function of u.This means that θ(u) can be maximized by a hill-climbing search.Subgradient optimization techniques are often used for this purpose [1,19].
When the inequalities g(x) ≤ 0 and the objective function f(x) are linear, both the surrogate and Lagrangean duals become the linear programming dual, since the two types of domination collapse into implication.
Nogoods can be obtained from the Lagrangean dual much as from the LP dual.If at some point in the search x F is fixed to xF and ū solves the Lagrangean dual problem then we have the nogood Nogoods of this sort could be generated in a wide variety of search methods, but they have apparently been used only in the special case of generalized Benders decomposition [9].This method can be applied when there is a set x F of variables for which (13) has no duality gap when x F is fixed.The set x F is therefore static, and x F is fixed by solving a master problem that contains the Benders cuts (14).In practice the classical method obtains the multipliers ū by solving as a nonlinear programming problem and letting ū be the Lagrange multipliers that correspond to the optimal solution.However, the multipliers could be obtained by solving the Lagrangean dual directly.as a nonlinear programming problem and letting ū be the Lagrange multipliers that correspond to the optimal solution.However, the multipliers could be obtained by solving the Lagrangean dual directly.

Superadditive/Subadditive Dual
The subadditive dual [17] [3,4,16,20].When P is defined in this way, the inference dual (2) becomes where H is the set of subadditive, nondecreasing functions.This is the subadditive dual of ( 15).Since P is complete, it is a strong dual.
The subadditive dual has been used primarily for sensitivity analysis in integer programming (e.g.[5]).It has apparently not been used in the context of nogood-based search.Since the form of domination used to define P is that used in the surrogate dual, one could obtain a Lagrangean analog of the subadditive dual by substituting the form of domination used in the Lagrangean dual.

Duals for Propositional Satisfiability
Propositional satisfiability (SAT) problems are often solved by a Davis-Putnam-Loveland (DPL) method with clause learning (e.g., [18]).These methods can be seen as nogood-based search methods derived from an inference dual.
The SAT problem can be written where C is a set of logical clauses.To formulate an inference dual, let P consist of unit resolution proofs (i.e., repeated elimination of variables that occur in unit clauses until no unit clauses remain).The dual problem (2) has optimal value ∞ when unit resolution proves unsatisfiability by deriving the empty clause.Since unit resolution is not a complete inference method, there may be a duality gap: the dual may have optimal value zero when the primal is unsatisfiable.Now suppose we solve ( 16) by branching on the propositional variables x j .At each node of the search tree, certain variables x F are fixed to xF .Let U contain the unit clause x j when xj = 1 and ¬x j when xj = 0. We now solve the inference dual of ( 16) with the clause set C ∪ U.If the optimal value is ∞, we generate a nogood and backtrack; otherwise we continue to branch.To generate the nogood, we identify a subset Ū of U for which some portion of the unit resolution proof obtains the empty clause from C ∪ Ū.Then is a nogood or "learned" clause that can be added to C before backtracking.This results in a basic DPL algorithm with clause learning.Similar algorithms can be developed for other types of resolution, including full resolution, which could be terminated if it fails to derive the empty clause in a fixed amount of time.

Domain Filtering Duals
A domain filtering algorithm can be viewed as an inference method and can therefore define an inference dual.For concreteness suppose that the domain of each variable is an interval [L j , U j ], and consider a filtering algorithm that tightens lower and upper bounds.The inference dual is most straightforward when we assume the objective function f(x) is monotone nondecreasing.In this case we can let P contain all possible proofs consisting of an application of the filtering algorithm to obtain reduced bounds [L j , U j ], followed by an inference that f(x) ≥ f(L 1 , . . ., L 2 ).This defines an inference dual (2).
If the filtering method achieves bounds consistency, f(L 1 , . . ., L n ) is the optimal value of (1), there is no duality gap, and the problem is solved.
If bounds consistency is not achieved, the dual can be useful in nogood-based search.Suppose we search by domain splitting, and let B contain the bounds currently imposed by the branching process.We can examine the filtering process to identify a subset B ⊂ B of bounds that are actually used to obtain the lower bounds L j that affect the value of f(L 1 , . . ., L n ).The resulting nogood can be used as described earlier.
A related idea has proved very useful in planning and scheduling [12,15,13].Let each variable x j in x F indicate which facility will process job j.The jobs assigned to a facility i must be scheduled subject to time windows; the variables in x U indicate the start times of the jobs.The processing times of job j may be different on the different facilities.For definiteness, suppose the objective f(x) is to minimize the latest completion time over all the jobs (i.e., minimize makespan).We solve each scheduling problem with a constraint programming method that combines branching with edge finding.This can be viewed as a complete inference method that defines an inference dual with no duality gap.Let vi be the minimum makespan obtained on facility i for a given assignment xF .By examining the edge finding and branching process, we identify a subset x J of job assignments for which the minimum makespan on each facility i is still vi .Then we have the nogood Nogoods of this sort are accumulated in a master problem that is solved to obtain the next xF , thus yielding a generalized form of Benders decomposition [14].The assignments in x J are identified by noting which jobs play a role in the edge finding at each node of the search tree; details are provided in [13].

Relaxation Duality
A parameterized relaxation of the optimization problem (1) can be written where u ∈ U is a vector of dual variables.The constraint set C(u) is a relaxation of C, in the sense that every x ∈ D that satisfies C satisfies C(u).The objective function f(x, u) is a lower bound on f(x) for all x feasible in (1); that is, f(x, u) ≤ f(x) for all x ∈ D satisfying C.
Clearly the optimal value of the relaxation ( 17) is a lower bound on the optimal value of (1).The relaxation dual of ( 1) is the problem of finding the parameter u that yields the tightest bound [12]: Let z * be the optimal value of (1), and θ(u) be the optimal value of the minimization problem in (18).Since θ(u) is a lower bound on z * for every u ∈ U , we have weak duality: the optimal value v of the relaxation dual ( 18) is a lower bound on z * .The lower bound v can abbreviate the search, as for example in a branchand-relax (branch-and-bound) scheme.The parameterized relaxation is chosen so that θ(u) is easy to compute.The dual problem of maximizing θ(u) over u ∈ U may be solved by some kind of search procedure, such as subgradient optimization in the case of Lagrangean relaxation.The maximization problem need not be solved to optimality, since any θ(u) is a valid lower bound.

Equivalence to an Inference Dual
Although inference and relaxation duality are very different concepts, a relaxation dual is always formally equivalent to an inference dual, provided there exists a solution algorithm for the parameterized relaxation.There does not seem to be a natural converse for this proposition.
To formulate the relaxation dual (18) as an inference dual, suppose that an algorithm P (u) is available for computing θ(u) for any given u ∈ U .We can regard P (u) as a proof that f(x) ≥ θ(u) and let P = {P (u) | u ∈ U }.The resulting inference dual (2) is max u∈U {θ(u)}, which is identical to the relaxation dual (18).

Linear Programming and Surrogate Duals
A simple parameterized relaxation for the inequality-constrained problem (10) uses a surrogate relaxation of the constraints but leaves the objective function unchanged.The relaxation therefore minimizes f(x) subject to ug(x) ≤ 0, where u ≥ 0. The resulting relaxation dual is the surrogate dual ( 11) of (1).Since the surrogate dual of an LP problem is the LP dual, the relaxation dual of an LP problem is likewise the LP dual.

Lagrangean Dual
Another parameterized relaxation for (10) removes the constraints entirely but "dualizes" them in the objective function.The parameterized relaxation minimizes f(u, x) = f(x) + ug(x) subject to x ∈ D. The function f(x, u) is a lower bound on f(x) for all feasible x since ug(x) ≤ 0 when u ≥ 0 and g(x) ≤ 0. The resulting relaxation dual is precisely the Lagrangean dual (13).
The close connection between surrogate and Lagrangean duals, conceived as inference duals, is much less obvious when they are reinterpreted as relaxation duals.

Superadditive/Subadditive Dual
The subadditive dual discussed earlier can be viewed as a relaxation dual that generalizes the surrogate dual.We can give the integer programming problem (15) a relaxation parameterized by subadditive, nondecreasing functions h, in which we minimize cx subject to h(Ax) ≥ h(b) and x ∈ D. (In the surrogate dual, the function h is multiplication by a vector u of nonnegative multipliers.)This yields the relaxation dual which is equivalent to the subadditive dual.

Constraint Dual
The constraint dual is related to a relaxation dual.More precisely, the constraint dual can be given a parameterized relaxation that yields a relaxation dual.A special case of the relaxation has been applied in mini-bucket elimination and perhaps elsewhere.
It is convenient to let x J denote the tuple of variables x j for j ∈ J. Given a constraint set C, the constraint dual of C is formed by "standardizing apart" variables that occur in different constraints and then equating these variables.So if x j1 , . .., x jn i are the variables in constraint C i ∈ C, let y i = (y i 1 , . . ., y i ni ) be a renaming of these variables.Also let J ik be the index set of variables that occur in both C i and C k .The constraint dual associates the dual variable y i with each constraint C i , where the domain D i of y i is the set of tuples that satisfy C i .The dual constraint set consists of the binary constraints y i Jik = y k Jik for each pair i, k.
The constraint dual can be relaxed by replacing each y i Jik = y k Jik with y i where J ik ⊂ J ik .It is helpful to think about the constraint graph G corresponding to the dual, which contains a vertex for each variable y i j and an edge between two variables when they occur in the same tuple y i or in the same equality constraint.Removing equality constraints deletes the corresponding edges from G, resulting in a sparser graph G(E), where E is the set of edges corresponding to the equality constraints that remain.The relaxation is therefore parameterized by the subset E of edges that defines G(E).This relaxation also serves as a parameterized relaxation C(E) of the original constraint set C. Thus if the constraint satisfaction problem is written then we can write the relaxation dual max where E is some (generally incomplete) family of subsets E. To solve the dual, we check the feasibility of C(E) for each E ∈ E. The family E normally would be chosen so G(E) has small induced width for E ∈ E, since in this case C(E) is easier to solve by nonserial dynamic programming.One way to construct E is to define sets of "mini-buckets" [7,8].We consider various partitions of the constraints in C, where the kth partition defines disjoint subsets or mini-buckets C k1 , . . ., C kmk .For each k and each t ∈ {1, . . ., m k } we let E kt contain the edges corresponding to equality constraints between variables occurring in C kt , so that C(E kt ) = C kt .Now E is the family of all sets E kt .Thus, rather than solve the relaxations C k1 , . . ., C kmk corresponding to a single set of mini-buckets as in [7], we solve relaxations C kt for all E kt .Other relaxation duals based on reducing the induced width are discussed in [12].All of these approaches can be applied to problems (19) with a general objective function f(x), as is done in mini-bucket elimination schemes.
is the set of n-tuples of nonnegative integers.(The superadditive dual is used when one maximizes cx subject to Ax ≤ b.)The subadditive dual can be viewed as an inference dual, using a form of inference that generalizes inference by nonnegative linear combination and domination.Let a real-valued function h(•) be subadditive when h(d + d ) ≤ h(d) + h(d ) for all d, d .We will say that a proof in P derives cx ≥ v from Ax ≥ b when h(Ax) ≥ h(b) dominates cx ≥ v for some nondecreasing, subadditive function h, and that h(Ax) ≥ h(b) dominates cx ≥ v when h(b) ≥ v and h(Ax) ≤ cx for all x ∈ D. This inference method can be shown to be complete for linear integer inequalities, based on cutting plane theory developed in