A Systematic Approach to MDD-Based Constraint Programming

,


Introduction
The domain store is a fundamental tool for constraint programming (CP), because it propagates the results of individual constraint processing.It allows the reduced domains obtained for one constraint to be passed to the next constraint for further filtering.A weakness of the domain store, however, is that it transmits a limited amount of information.It accounts for no interaction among the variables, because any solution in the Cartesian product of the current domains is consistent with it.This restricts the ability of the domain store to pool the results of processing individual constraints and provide a global view of the problem.
To address this shortcoming, Andersen, Hadzic, Hooker, and Tiedemann [1] proposed replacing the domain store with a richer data structure, namely a multivalued decision diagram (MDD).In their approach, domain filtering algorithms are replaced or augmented by algorithms that refine and update the MDD to reflect each constraint.It was found that MDD-based propagation can lead to substantial speedups in the solution of multiple AllDifferent constraints.The idea was extended to equality constraints by Hadzic et al. [4].A unified nodesplitting scheme for refining the MDD was proposed by Hadzic et al. [3] and applied to certain configuration problems.For this reason, we will mainly focus on filtering algorithms in this work.
MDDs have been applied before in CP.For example, in [6] and [2] MDDs are applied to perform inferences (domain filtering) based on individual constraints.
In [5], this is taken one step further, by passing structural information from one constraint to the next.The key difference is that in these approaches, an MDD is built and maintained for each individual constraint, whereas in MDD-based constraint programming, the MDD is the information that is passed from one constraint to the next.In other words, multiple constraints process the same MDD, instead of each constraint processing its individual MDD.
The contributions of this work are threefold.First, we introduce a systematic scheme for constraint propagation in MDDs, and we show that all previously proposed filtering algorithms for MDD-based CP can be viewed as instantiations of our scheme.Second, we apply our scheme to introduce new filtering algorithms for other constraints; we present such algorithms for the Among, Element, and unary resource constraints as an illustration of the versatility of the approach.Third, we present computational results for the first pure MDD-based CP solver, showing that i) this approach can scale up to realistic problem sizes, and ii) enormous savings (in terms of time as well as search tree size) can be realized when compared to solvers relying on the traditional domain store.
The remainder of the paper is organized as follows.In Section 2 we provide the necessary background on MDDs and MDD-based constraint programming.In Section 3 we present and discuss a systematic scheme for constraint propagation in MDDs.We apply our scheme to a variety of constraints in Section 4. In Section 5 we provide a brief description of our MDD-based constraint programming system.Experimental results using this system are reported in Section 6.Finally, we conclude in Section 7.

MDDs and MDD-Based Constraint Solving
In this work, an ordered Multivalued Decision Diagram (MDD) is a directed acyclic graph whose nodes are partitioned into n (possibly empty) subsets or layers L 1 , . . ., L n+1 , where the layers L 1 , . . ., L n correspond respectively to variables x 1 , . . ., x n .L 1 contains a single top node T, and L n+1 contains two bottom nodes 0 and 1.The width of the MDD is the maximum number of nodes in a layer, or max n i=1 {|L i |}.In MDD-based CP, the MDDs typically have a given fixed maximum width.
All edges of the MDD are directed from an upper to a lower layer; that is, from a node in some L i to a node in some L j with i < j.For our purposes it is convenient to assume (without loss of generality) that each edge connects two adjacent layers.Let L(s) denote the layer of the node s, and var(s) the variable associated with L(s).Each edge out of layer i is labeled with an element of the domain D(x i ) of x i .The set E(s, t) of edges from node s to node t may contain multiple edges, and we denote each with its label.Let E in (s) denote the set of edges coming into s, and E out (s) the set of edges leaving s.For an edge e, tail(e) is the tail of e and head(e) the head.
An edge with label v leaving a node in layer i represents an assignment x i = v.Each path in the MDD from T to 0 or 1 can be denoted by the edge labels v 1 , . . ., v n on the path and is identified with the assignment (x 1 , . . ., (v 1 , . . ., v n ).For our purposes, it is convenient to generate only the portion of an MDD that contains paths from T to 1.
A constraint C is called MDD consistent on a given MDD if every edge of the MDD lies on some feasible path.Thus MDD consistency is achieved when all redundant edges (i.e., edges on no feasible path) have been removed.Domain consistency for C is equivalent to MDD consistency on an MDD of width one that represents the variable domains.That is, it is equivalent to MDD consistency on an MDD in which each layer L i contains a single node s i , and E(s i , s i+1 ) = D(x i ) for i = 1, . . ., n.
Typically, MDD-based constraint programming starts with simple MDD (of width one) that permits all solutions represented by the Cartesian product of the domains.This MDD is then refined each time a constraint is processed.Refinement is accomplished by adding some nodes and edges to the MDD so as to exclude solutions that violate the constraint.Example 1 below gives an illustration of this (see also Figure 1).
The basic operation of refinement is node-splitting, in which the edges entering a given node are partitioned into equivalence classes, and ideally the node is split into one copy for each equivalence class.The set of outgoing edges for each copy is the same as the set of outgoing edges of the original node.We note that determining the equivalence classes may be costly to compute in practice, in which case an approximation of equivalence is used.We take care that the width of the MDD (maximum number of nodes in a layer) remains within a fixed bound.When splitting a node we merge equivalence classes when necessary in order to respect this restriction.The resulting MDD is a relaxation in the sense that it may fail to exclude all assignments that violate the constraint, but it is a much stronger relaxation than a domain store.A principled approach to node refinement in MDDs is introduced by Hadzic et al. [3].
We also update the MDD by deleting infeasible edges, an operation that generalizes conventional domain filtering.We will refer to this operation as MDD filtering.This can lead to further reduction of the MDD, if after the removal of the edge some other edges no longer have a path to 1 or can no longer be reached by a path from the root.An MDD-based constraint solver is based on propagation and search just as traditional CSP solvers, but the domain filtering process at each node of the search tree is replaced (or supplemented) by an MDD refinement and filtering process.
The MDD-based approach starts with the MDD of width one in Figure 1(a), in which multiple arcs are represented by a set of corresponding domain values for clarity.We refine and filter each constraint separately.Starting with x 1 = x 2 , we refine the MDD by splitting the node at layer 2, resulting in Figure 1(b).This allows us to filter two domain values, based on x 1 = x 2 , as indicated in the figure.In Figure 1(c) and (d) we refine and filter the MDD for the constraints x 2 = x 3 and x 1 = x 3 respectively, until we reach the MDD in Figure 1(e).This MDD represents all three solutions to the problem, and provides a much tighter relaxation than the domain store.

A Systematic Scheme for MDD Propagation
In the literature, MDD propagation algorithms (in the sense of Section 2) have been proposed for the following three constraint types: (one-sided) inequality constraints [1], AllDifferent [1], and equality constraints [3].The reasoning used for designing propagation algorithms for each of these constraints seemed to be ad-hoc.In this section we will present and analyze a systematic scheme for designing MDD propagation algorithms.In the following section we use this procedure to express the existing MDD propagation algorithms and introduce new algorithms for the Among, Element, and unary resource constraints.

The General Scheme
Our scheme is based on the idea of 'local information' I(s) stored for each constraint at each node s.We will show that all MDD propagation schemes so far introduced, and others as well, can be viewed as based on local information.The precise nature of the local information depends on the propagation scheme, which is characterized in part by what kind of local information is required to apply it.
More precisely, the decision as to whether to delete an edge in E(s, t) is based solely on I(s) and I(t)-that is, on local information stored at either end of the edge.Furthermore, the local information can be accumulated by a single top-down pass and a single bottom-up pass through the MDD.
It is convenient to regard I(s) as a pair (I ↓ (s), I ↑ (s)) consisting of the information I ↓ (s) accumulated during the top-down pass, and the information I ↑ (s) accumulated during the bottom-up pass.I ↓ (s) and I ↑ (s) can take several forms, such as a set of domain values or parameters related to the constraint, or a tuple of such sets.What is common to all the schemes is that I ↓ (s) is computed solely on the basis of local information at the opposite end of edges coming into s, and I ↑ (s) on the basis of local information at the opposite end of edges leaving s.
Formally, we introduce an operation ⊗ that processes information when traversing an edge during a top-down or bottom-up pass.When traversing an edge e ∈ E(s, t) during the top-down pass, the information I ↓ (s) at node s is combined with edge e to obtain updated information I ↓ (s) ⊗ e.We view this updated information as an object I having the same form as I ↓ (s).When several edges enter node t, we use an operation ⊕ to combine the information obtained by traversing the incoming edges.The top-down information at t is therefore Similarly, the bottom-up information at s is Several examples of this scheme appear in the following sections.
Since a top-down (bottom-up) pass of the MDD visits each edge exactly once, the passes themselves involve an amount of work that is linear in the size of the MDD (modulo the work required to compute ⊕ and ⊗ at each node).
The operators ⊗ and ⊕ can be implemented as high-level macros that are instantiated differently for each constraint type.Our MDD-based CP solver follows this idea very closely.

MDD Consistency
The scheme above can sometimes achieve MDD consistency in polynomial time.
In particular, if it can determine in polytime whether any particular assignment x j = v is consistent with the MDD, then it achieves MDD consistency in polytime due to the following theorem.
Theorem 1. Suppose that the feasibility of x j = v for a given constraint C on a given MDD M can be determined in O(f (M )) time and space for any variable x j in C and any v ∈ D(x j ).Then we can achieve MDD consistency for C in time and space at most O(poly(M )f (M )).
The proof is a straightforward shaving argument.For each edge e of M we consider the MDD M e that consists of all the T-to-1 paths in M containing e. Then e can be removed from M if and only if x j = e is inconsistent with C and M e , where j = L(tail(e)).This can be determined in time and space at most O(f (M e )) ≤ O(f (M )).By repeating this operation poly(M ) times (i.e., on each edge of M ) we obtain the theorem.
To establish MDD consistency, the goal is to efficiently compute information that is strong enough to apply Theorem 1.In the sequel, we will see that this can be done for inequality constraints and Among constraints in polynomial time, and in pseudo-polynomial time for two-sided inequality constraints.Furthermore, we have the following result.
Corollary 1.For binary constraints that are given in extension, our scheme can be applied to achieve MDD consistency in polynomial time (in the size of the constraint and the MDD).
To see how this is accomplished, suppose C is a binary constraint containing variables x i , x j for i < j (the argument for j < i is similar).We let I ↓ (s) contain all the values assigned to x i in some path from T to s, and I ↑ (t) contain all the values assigned to x j in some path from s to 1. Then we can delete edge e ∈ E(s, t) from M if and only if (a) L(s) = j and (x i , x j ) = (v, e) satisfies C for no v ∈ I ↓ (s), or (b) L(s) = i and (x i , x j ) = (e, v) satisfies C for no v ∈ I ↑ (t).Because |I ↓ (s)| and |I ↑ (t)| are polynomial in the size of M , we can perform this check in time that is polynomial in the size of M and C. Also, we can compute the local information by defining the ⊕ operator with I ↓ (T) = I ↑ (1) = ∅, and the ⊕ operator The top-down and bottom-up passes clearly require time that is polynomial in the size of M .The corollary then follows from Theorem 1.

Specialized Propagators
We now present several MDD propagation algorithms that rely on local information obtained as above.The filtering may not be as strong as for a conventional domain store, in the sense that when specialized to an MDD of width one, it may not remove as many values as a conventional filter would.However, a 'weak' filtering algorithm can be very effective when applied to the richer information content of an MDD.
If one prefers not to design a filter specifically for MDDs, there is also the option of using a conventional domain filter by adapting it to MDDs.This can be done in a generic fashion that turns out to be yet another application of the above scheme.Section 4.8 explains how this is done.

Equality and Not-Equal Constraints
We first illustrate MDD propagation of the constraints x i = x j and x i = x j .Because these are binary constraints, by Corollary 1 the scheme presented in Section 3.2 achieves MDD consistency in polytime.If we compute I ↓ as described in that section, we can achieve MDD consistency for x i = x j by deleting an edge e ∈ E(s, t) whenever (a) L(s) = j and e ∈ I ↓ (s) or (b) L(s) = i and e ∈ I ↑ (t).We can achieve MDD consistency for x i = x j by deleting e whenever (a) L(s) = j and I ↓ (s) = {e} or (b) L(s) = i and I ↑ (t) = {e}.
We note that this scheme generalizes directly to propagating f i (x i ) = f j (x j ) and f i (x i ) = f j (x j ) for functions f i and f j .The scheme can also be applied to constraints x i < x j .However, in this case we only need to maintain bound information instead of sets of domain values, which leads to an even more efficient implementation.

Propagating Linear Inequalities
We next focus on the filtering algorithm for general inequalities, as proposed in [1].That is, we want to propagate an inequality over a separable function of the form: We can propagate such constraint on an MDD by performing shortest-path computations.
Recall that each edge e ∈ E(s, t) is identified with a value assigned to var(s).Supposing that var(s) = x j , we let the length of edge e be f j (e) when j ∈ J and zero otherwise.Thus the length of a T-to-1 path is the left-hand side of (1).
We let I ↓ (s) be the length of a shortest path from T to s, and I ↑ (s) the length of a shortest path from s to 1. Then we delete an edge e ∈ E(s, t) when L(s) ∈ J and every path through e is longer than b; that is, It is easy to compute local information in the form of shortest path lengths, because we can define for e ∈ E(s, t) with I ↓ (T) = I ↑ (1) = 0. We also define This inequality propagator achieves MDD consistency as an edge e is always removed unless there exists a feasible solution to the inequality that supports it [1].

Propagating Two-sided Inequality Constraints
In this section, we present a generalization of the equality propagator described by Hadzic et al. [3].It extends the inequality propagator of Section 4.2, but now we store all path lengths instead of only the shortest and longest paths.
Suppose we are given an inequality constraint l ≤ j∈J f j (x j ) ≤ u, where l and u are numbers such that l ≤ u.Let I ↓ (s) be the set of all path lengths from T to s, and I ↑ (s) the set of all path lengths from s to 1.We delete an edge e ∈ E(s, t) when The local information is computed by defining for e ∈ E(s, t) When we delete an edge, the information stored at all predecessors and successors becomes 'stale', and the information for these nodes must be recomputed to guarantee MDD consistency.However, we will achieve MDD consistency if, every time we delete an edge, we update the node information for all predecessors and successors and repeat this filtering and updating until we reach a fixed point.This follows because the filtering condition above is both necessary and sufficient for an edge to be supported by a feasible solution.Observing that the information can be computed in pseudo-polynomial time, by Theorem 1 this algorithm runs in pseudo-polynomial time (see also [3]).

Propagating the AllDifferent Constraint
The constraint AllDifferent(x i , i ∈ J) requires that the variables x i for i ∈ J take pairwise distinct values.We can frame the AllDifferent propagator presented in Andersen et al. [1] in terms of our scheme.Let I ↓ (s) = (A ↓ (s), S ↓ (s)) and I ↑ (s) = (A ↑ (s), S ↑ (s)).Here A ↓ (s) is the set of values that appear on all paths from T to s-that is, the set of values v such that on all T-s paths, x j = v for some j ∈ J. S ↓ (s) is the set of values v that appear on some T-s path.A ↑ (s) and S ↑ (s) are defined similarly.
We can delete edge e ∈ E(s, t) for L(s) ∈ J when e ∈ A ↓ (s) ∪ A ↑ (t).We can also delete e when the variables above s, or the variables below s, form a Hall set.To make this precise, let X ↓ s = {x j | j ∈ J, j < L(s)} be the set of variables in the AllDifferent constraint above s, and values in S ↓ (s) cannot be assigned to any variable not in X ↓ (s).So we delete e if e ∈ S ↓ (s).Similarly, if X ↑ (t) is a Hall set, we delete e if e ∈ S ↑ (t).Finally, we compute the local information by defining for e ∈ E(s, t) where the unions are taken componentwise and I ↓ (T) = I ↑ (1) = (∅, ∅).Also we define

Propagating the Among Constraint
The Among constraint restricts the number of variables that can be assigned a value from a specific subset of domain values.Formally, if X = (x 1 , . . ., x q ) is a sequence of variables, S a set of domain values, and ℓ, u, q are constants with 0 ≤ ℓ ≤ u ≤ q, then Among(X, S, ℓ, u) requires that We can reduce propagating Among(X, S, ℓ, u) to propagating a two-sided separable inequality constraint, where Because each f i (•) ∈ {0, 1}, we can compute the information in polynomial time, and by Theorem 1 MDD consistency can be achieved in polynomial time for Among constraints.However, this filtering is too slow in practice.Instead, we propose to propagate bounds information instead.That is, we can use the inequality propagator for the pair of inequalities separately, and reason on the shortest and longest path lengths, as in Section 4.2.

Propagating the Element Constraint
We next consider constraints of the form Element(x i , (a 1 , . . ., a m ), x j ), where the a k are constants.This means that the variable x j must take the x th i value in the list (a 1 , . . ., a m ); that is, Because this is a binary constraint, we can achieve MDD consistency in polytime by defining I ↓ (s), I ↑ (t) as in the proof of Corollary 1. Supposing that i < j, we delete an edge e ∈ E(s, t) when (a) L(s) = j and e = a k for no k ∈ I ↓ (s), or (b) L(s) = i and a e ∈ I ↑ (t).

Propagating the Unary Resource Constraint
We consider unary resource constraints of the following form.We wish to schedule a set A of activities on a single resource (non-preemptively).Each activity a ∈ A has a given release time r a , deadline d a , and processing time p a .We model the problem using variables X = (x 1 , . . ., x |A| ), where x i = a implies that activity a is the i th activity to consume the resource.That is, we to find the order in which to process the activities, from which the start times can be immediately derived.
First, observe that in our representation the variables encode a permutation of A, which means that we can immediately apply the AllDifferent propagator from Section 4.4.
To enforce the time windows, let I ↓ (s) be the earliest start time of the activity in position L(s) of the sequence, given the previous activities in the sequence.Let I ↑ (t) be latest completion time of the activity in position L(t) − 1, given the subsequent activities.Then we can delete edge e ∈ E(s, t) if the time window is too small to complete activity e; that is, if The local information is computed

Using Conventional Domain Filters
An existing domain filter can be adapted to MDD propagation on the basis of local information.However, the resulting propagator may not achieve MDD consistency even when it achieves domain consistency.
The adaptation goes as follows.Following Andersen et al. [1], we define the induced domain relaxation D × (M ) of an MDD M to be a tuple of domains (D × 1 (M ), . . ., D × n (M )) where each D × i (M ) contains the values that appear on level i of M .That is, We can perhaps delete edges from a given level i of M by selecting a node s on level i and applying the conventional filter to the domains where M s is the portion of M consisting of all paths through node s.We remove values only from the domain of x i , that is from E out (s), and delete the corresponding edges from M .This can be done for each node on level i and for each level in turn.
Lemma 1.The induced domain relaxation D × (M s ) can be computed for all nodes s of a given MDD M in polynomial time (in the size of the MDD).
Again following Andersen et al. [1], we can strengthen the filtering by noting which values can be deleted from the domains D × j (M s ) for j = i when (2) is filtered.If v can be deleted from D × j (M s ), we place the nogood x j = v on each edge in E out (s).Then we move the nogoods on level i toward level j.If j > i, for example, we filter (2) for each node on level i and then note which nodes on level i + 1 have the property that all incoming edges have the nogood x j = v.These nodes propagate the nogood to all their outgoing edges, and so forth until level j is reached, where all edges with nogood x j = v and label v are deleted.

Implementation Issues
We have implemented a C++ system for MDD-Based Constraint Programming.For a detailed description of the system, we refer to [7].We next highlight some of the most important design decisions we have made.
Even though MDDs can grow exponentially large to represent a given constraint perfectly, the basis of MDD-based constraint programming is to control the size of the MDD by specifying a maximum width k.A MDD of width one is equivalent to the conventional domain store, while increasing values of k allow the MDD to converge to a perfect representation of the solution space.An MDD on n variables therefore contains O(nk) nodes.Furthermore, by aggregating the domain values corresponding to multiple (parallel) edges between two nodes, the MDD contains O(nk 2 ) edges.Therefore, for fixed k, all bottom-up and top-down passes take linear time.
In our system, we do not propagate the constraints until a fixed point.Instead, by default we allocate one bottom-up and top-down pass to each constraint.The bottom-up pass is used to compute the information I ↑ .The topdown pass processes the MDD a layer at a time, in which we first compute I ↓ , then refine the nodes in the layer, and finally apply the filtering conditions based on I ↑ and I ↓ .
Our outer search procedure is currently implemented using a priority queue, in which the search nodes are inserted with a specific weight.This allows to easily encode depth-first search or best-first search procedures.Each search tree node contains a copy of the MDD of its parent, together with the associated branching decision.When applying a depth-first search strategy, the total amount of space required to store all MDDs is polynomial, namely O(n 2 k 2 ).We note that 'recomputation' strategies that are common in conventional CP systems, cannot be easily applied in this context, because the MDD may change from parent to child node, due to the node refinement procedure.

Experimental Results
In this section, we provide detailed experimental evidence to support the claim that MDD-based constraint programming can be a viable alternative to constraint programming based on the domain store.All the experiments are performed using a 2.33GHz Intel Xeon machine with 8GB memory, using our MDDbased CP solver.For comparison reasons, our solver applies a depth-first search, using a static lexicographic-first variable selection heuristic, and a minimumvalue-first value selection heuristic.We vary the maximum width of the MDD, while keeping all other settings the same.
Multiple Among Constraints We first present experiments on problems consisting of multiple Among constraints.Each instance contains 50 (binary) variables, and each Among constraint consists of 5 variables chosen at random, from a normal distribution with a uniform-random mean (from [1..50]) and standard deviation σ = 2.5, modulo 50.As a result, for these Among constraints the variable indices are near-consecutive, a pattern encountered in many practical situations.Each Among has a fixed lower bound of 2 and upper bound of 3, specifying the number of variables that can take value 1.In our experiments we vary the number of Among constraints (from 5 to 200, by steps of 5) in each instance, and we generate 100 instances for each number.We note that these instances exhibit a sharp feasibility phase transition, with a corresponding hardness peak, as the number of constraints increases.We have experimented with several other parameter settings, and we note that the reported results are representative for the other parameter settings; see Hoda [7] for more details.
In Figure 2, we provide a scatter plot of the running times for width 1 versus width 4, 8 , and 16, for all instances.Note that this is a log-log plot.Points on the diagonal represent instances for which the running times, respectively number of backtracks, are equal.For points below the diagonal, width 4, 8, or 16 has a smaller search tree, respectively is faster, than width 1, and the opposite holds for points above the diagonal.
We can observe that width 4 already consistently outperforms the domain store, in some cases up to six orders of magnitude in terms of search tree size (backtracks), and up to four orders of magnitude in terms of computation time.For width 8, this behavior is even more consistent, and for width 16, all instances can be solved in under 10 seconds, while the domain store needs hundreds or thousands of seconds for several of these instances.

Nurse Rostering Instances
We next conduct experiments on a set of instances inspired by nurse rostering problems, taken from [8].The instances are of three different classes, and combine constraints on the minimum and maximum number of working days for sequences of consecutive days of given lengths.and solution time as we increase the width up to 64.Furthermore, the rate of decrease appears to be exponential in many cases, and again higher widths can yield savings of several orders of magnitude.A typical result (the instance C-III on 60 days) shows that where an MDD of width 1 requires 95,509 backtracks and 173.08 seconds of computation time, an MDD of width 32 only requires 6 backtracks and 0.28 seconds of computation time to find a first feasible solution.

Conclusion
We have introduced a generic scheme for propagating constraints in MDDs, and showed that all existing MDD-based constraint propagators are instantiations of this scheme.Furthermore, our scheme can be applied to systematically design propagators for other constraints, and we have illustrated this explicitly for the Among, Element, and unary resource constraints.We further provide experimental results for the first pure MDD-based constraint programming solver, showing that MDD-based constraint programming can yield savings of several orders of magnitude in time and search space as compared to the conventional domain store. Bibliography

Table 1 .
The effect of the MDD width on time in seconds (CPU) and backtracks (BT) when finding one feasible solution on nurse rostering instances.