On the use of mathematical programs with complementarity constraints in combined topological and parametric design of biochemical enzyme networks

ABSTRACT A new method is presented for biochemical enzyme network design based on direct transcription and mathematical programs with complementarity constraints. Topology and continuous parameters are optimized simultaneously. The case study design objective is to optimize adaptability while maintaining sufficient sensitivity to ensure input change detection. A three-node problem is solved using both simultaneous and single-shooting approaches. The simultaneous approach enables solution of four-node problems; this is a new capability not available through existing approaches such as exhaustive enumeration, and is a step toward designing larger systems. A conventional nested solution strategy was also investigated for a four-node problem where an outer loop solves the discrete topology optimization problem, and an inner loop solves the continuous problem for each candidate topology. The simultaneous approach yields robust network topological designs that are superior to those identified through the single-shooting and nested strategies.


Introduction
Synthetic biology aims to invent and design novel devices, systems and organisms to achieve useful biological function (Weber and Fussenegger 2012). Although significant progress has been made in designing and constructing a variety of genetic regulatory networks, these have been limited to small-scale (normally single-function) networks. An important object in synthetic biology is to design and construct complex higher-order biological networks (Lu, Khalil and Collins 2009). In this article, a design strategy using direct transcription (DT) and mathematical programming with complementarity constraints (MPCCs) is presented to solve the combined topology and continuous parameter enzyme network design problem. Direct transcription is employed to solve for optimal continuous parameters. Complementarity constraints (CCs) are used to manage topological decisions. The authors' objective here is to extend existing engineering design optimization techniques in a way that supports the identification of genetic network designs that exhibit desired behaviour with much less computational effort. Furthermore, these efficiency improvements increase the dimension of biological networks that can be designed and implemented successfully. The proposed optimization-based approach offers demonstrated advantages over traditional design methodologies-such as exhaustive enumeration-when solving specific three-node and four-node enzyme network topologies.

Synthetic biology
Synthetic biology is a new research area that involves the design of biological devices and systems. It encompasses biology, chemistry, mathematics and engineering with a wide range of applications such as biochemical treatment (Weber and Fussenegger 2012), bio-energy (Anderson et al. 2010) and bio-sensors (Beal and Bachrach 2006). As a framework for conceptualizing synthetic biology, Andrianantoandro et al. (2006) compares possible hierarchies between synthetic biology and computer engineering. Within the hierarchy, one can think of DNA, mRNA, genes, proteins and metabolites as the physical layers such as transistors, capacitors and resistors. The biochemical reactions in the device layer that manipulate interactions and processes are analogous to the logical gates in a computer. At the module layer, biologists assemble these biological devices into pathways resembling circuits, which can integrate into cells that help perform certain biological functions, such as specific tissue cells. The design process is similar to that in computer engineering where modules (electrical circuits or chips) are integrated together to form a complete system.
The first wave of synthetic biology occurred when basic parts and devices (promoters, ribosome binding sites and transcriptional repressors) were used to construct small modules with specified functions (Purnick and Weiss 2009). Early examples of genetic regulatory networks include a toggle switch and an oscillating genetic regulatory network developed by Gardner, Cantor and Collins (2000) and Elowitz and Leibler (2000). The toggle switch in Escherichia coli was constructed from two repressors and two promoters, where each promoter is inhibited by the repressor in transcription of the other promoter. The toggle exhibits bi-stable behaviour. An oscillating genetic network, also termed a repressilator, consists of three transcriptional repressor promoters, each of which transcribes one of the other promoters, resulting in an oscillating network. Through the combination of mathematical models and experiments, these efforts demonstrate progress towards a more general study of synthetic biology (Judd, Laub and McAdams 2000). In addition to switches and oscillators (Kramer 2004;Isaacs et al. 2004;Ham et al. 2008;Deans, Cantor and Collins 2007;Ajo-Franklin et al. 2007;Friedland et al. 2009;Stricker et al. 2008;Tigges et al. 2009), synthetic biologists have also engineered a wide range of artificial genetic regulatory networks including digital logic evaluators (Rinaudo et al. 2007), transcriptional cascades (Hooshangi, Thiberge and Weiss 2005), filters (Sohka et al. 2009), and cell-cell communicators (Basu et al. 2005;Kobayashi et al. 2004).
Computational tools have also contributed to the advancement of synthetic biology. BioBrick, proposed by Knight (2003), is a standard where units of DNA (BioBrick parts) can be used to design and assemble genetic networks. These parts, which include promoters and ribosome binding sites, are stored at the Registry of Standard Biological Parts (parts.igem.org). Despite its simplicity and popularity, the Biobrick standard has limitations due to its inability to construct protein fusions. Anderson et al. (2010) presented a similar but newer composition standard (BglBriks) that addresses this shortcoming. Eugene is a domain-specific language for creating and improving synthetic biological systems (Densmore 2014) that permits the use of part specifications and rule-based constraints (Bilitchenko et al. 2011) and enables the creation and assembly of biological devices with the aid of a platform-based design environment such as Clotho (Densmore et al. 2009). The bio-oriented language developed by Bachrach (2006, 2008) is particularly suitable for engineered genetic regulatory networks when describing various spatial patterns to construct tissues or organs. The highlevel specification described in the Proto language is converted into an abstract genetic regulatory network, and the genetic network can be optimized using the compiler (Beal, Lu and Weiss 2011). Several other programming languages such as Genetic Engineering of Living Cells (GEC) (Pedersen and Phillips 2009) are summarized by Purnick and Weiss (2009).
Established strategies from other engineering domains have been extended to synthetic biology: for instance, feedback control plays an important role in genetic regulatory networks and has been applied to study perfect adaptation in bacterial chemotaxis (Yi et al. 2000), maintaining calcium homeostatis in mammals (El-Samad, Goff and Khammash 2002), biophasic responses in the chemo attractant in Dictyostelium cells (Yang and Iglesias 2006) and enhanced sensitivity and stability of the genetic networks (Stelling et al. 2004). Mathematical modelling and simulations are also crucial tools. Differential equations are used to characterize dynamic behaviours within biological systems, e.g. applications for Bacillus subtilis chemotaxis (Rao, Kirby and Arkin 2004) and the synthetic transcriptional cascade in Escherichia coli (Batt et al. 2007). Techniques such as parameter fitting and Monte Carlo sampling are also used, e.g. in microbial biofuel production modelling, with simulations implemented in the Matlab environment (Dunlop, Keasling and Mukhopadhyay 2010). Optimization has been used in the area of biochemical reaction networks (Banga 2008). Linear and nonlinear programs have been applied to metabolic flux balance analysis (Varma and Palsson 1994), metabolic networks (Patil et al. 2005), microbial cells (Price, Reed and Palsson 2004), biochemical pathways (Mendes and Kell 1998), and metabolism analysis (Vo and Palsson 2006;Vo, Lee and Palsson 2007). Both discrete and continuous decision variables have been investigated for biochemical networks in terms of mixed integer linear and nonlinear programs (MILP and MINLP), including time-delay inferring (Dasika et al. 2004), the inference of regulatory interactions (Thomas et al. 2007), metabolic networks (Lee et al. 2000;Hatzimanikatis, Floudas and Bailey 1996a;1996b) and gene and enzymes manipulations based on kinetic models (Vital-Lopez et al. 2006). Evolutionary strategies proposed by François and Hakim (2004) were used to create genetic networks with functions of bistate switches and oscillators. AutoBioCAD, a design tool for genetic networks, contains a library of genetic elements and incorporates strategies of enumeration and optimization in the evolutionary scheme to design genetic networks with dynamic behaviours (Rodrigo and Jaramillo 2012).
Despite significant achievements, artificial genetic network design remains a challenging task. Many efforts have concentrated on achieving particular biological functions in genetic networks such as oscillations (Tsai et al. 2008;Wagner 2005) and adaptation (Alon et al. 1999; Barkai and Leibler 1997;Hornung and Barkai 2008). While exhaustive enumeration strategies have been used for robust design of a Drosophila segment polarity network (Ma et al. 2006), biochemical adaptation of enzyme networks (Ma et al. 2009), ultra-sensitivity and bi-stability of switch-like cellular responses (Shah and Sarkar 2011), AutoBioCAD (Rodrigo and Jaramillo 2012) and engineering system architecture (Herber, Guo and Allison 2016), enumeration is an inefficient strategy that is limited to solution of small-scale problems. For instance, Ma et al. (2009) carried out a series of analyses on three-node enzyme networks-a particular class of genetic regulatory network-and successfully identified two primary topologies that produce adaptive behaviour. They used exhaustive enumeration to explore all possible 16,038 three-node network topologies. Each topology was tested for adaptive properties by simulating the network 10,000 times to determine the parameter value combination that produced the best results. This nested exhaustive enumeration approach carries with it extreme computational expense, and the size of networks that can be explored in this manner is therefore very limited.
By 2009 the complexity of successfully implemented genetic regulatory networks, as measured by the number of regulatory regions, had plateaued at six (Purnick and Weiss 2009). While some optimization problems and approaches have been explored (Feng et al. 2004;Festa 2007;Balsa-Canto and Banga 2011;Handl, Kell and Knowles 2007;Banga 2008), to the authors' knowledge very limited research has been focused on dynamic optimization for both discrete and continuous variables. Adiwijaya, Barton and Tidor (2006) proposed a biological network design method that used dynamic optimization, but required manual adjustments to network topology. Dasika and Maranas (2008) developed an optimization-based method for computational design of genetic networks using mixed integer dynamic optimization (MIDO), requiring MINLP solvers.
The work presented here builds on previous efforts to apply new engineering design methods to genetic regulatory network design with the objectives of (1) increasing the scale and complexity of networks that can be explored and designed effectively (a vital step in bringing synthetic biology capabilities toward functionally important systems), and (2) providing important new tools and understanding that support these advancements. This class of design problem is particularly difficult because it involves both topological and continuous design variables, and dynamic behaviour is nonlinear and stochastic. This results in a complex and difficult-to-navigate design space. Studies presented here are based on several simplifying assumptions, but fill important gaps in current design methods and knowledge. The first assumption made here is system class; studies are restricted here to a specific class of genetic regulatory networks-biochemical enzyme networks-and focus on identifying designs that are highly adaptable to network input changes. It is further assumed here that dynamic network behaviour is adequately represented by deterministic nonlinear differential equations. This is admittedly a significant assumption and associated simulation results will certainly deviate in important ways from realizable biochemical behaviour, but the assertion here is that this investigation strategy reveals valuable trends in design results, and, most importantly, insight into new directions for design methodologies and tools for synthetic biology with the aim of significantly increased scale and functionality of synthetic systems.
An additional contribution of the work presented here is a novel approach to combined topological and continuous parameter design optimization for dynamic systems. Instead of relying on enumeration or MINLP solvers, the problem is converted to a continuous formulation that simultaneously solves the topological and continuous elements of the problem. Dynamic optimization is managed using direct transcription (DT), and the DT formulation is augmented with complementarity constraints (CCs) to support topological decisions. DT has been combined in the past with CCs to model hybrid dynamic optimization problems to form mathematical programs with complementarity constraints, but not for the topological design of dynamic systems. MPCCs have been formulated for static topological design problems in other engineering domains, but not with direct consideration of system dynamics.
Investigation results are compared against other studies based on deterministic dynamic models, and are found to be consistent with earlier findings. In addition, the new method requires lower computational effort than existing alternatives. These comparisons can only be made for small three-node networks as earlier methods are not capable of managing larger network design problems. The new DT-MPCC solution method is then demonstrated successfully for the design of a larger network, providing evidence that advancements in new optimization-based methods may lead to successful design of complex biochemical networks. Sensitivity studies are performed and additional solution strategies are investigated. It is concluded here that DT-MPCC is useful for the design of moderately larger networks, but that alternatives should be explored creatively to strive for methods with the potential for navigating much larger network design problems. The work reported here is part of a larger ongoing effort to step back from existing design methods in synthetic biology 1 and evaluate whether fundamentally different strategies may help designers design and implement synthetic biological systems with completely new levels of complexity that are essential for practical applications. The specific formulation and method presented here has produced evidence that a new direction in design strategies is worth further investigation. The specific method and formulation presented here (DT-MPCC) is not the final result of the larger effort. It may be moderately generalizable, but is only a step toward a comprehensive design framework that applies to a wide variety of problems and is supported by experimental validation of not only predicted design behaviour, but also design method validation. Immediate next steps include both new strategies for navigating complex topological design spaces, and methods that account for stochastic behaviour-possibly leveraging theory and methods for design optimization under uncertainty (Aoues and Chateauneuf 2010). Additional problem classes beyond adaptation, as well as transcriptional regulation networks, should also be studied.
The remainder of this section provides an overview of DT and CCs, the two essential components of the new design method. The problem formulation and model is then presented, followed by the results of numerical studies and conclusions.

Direct transcription
Direct transcription (DT) is a method for dynamic system optimization that has been applied widely to trajectory optimization, optimal control and parameter estimation problems (Hargraves and Paris 1987;Betts 2010) in a variety of domains, such as crystallization and batch distillation (Biegler 2007), aircraft trajectory design Betts and Cramer 1995), orbital trajectory design (Ozimek, Grebow and Howell 2008) and active suspension system co-design (Allison, Guo and Han 2014). Here the word 'transcription' in DT is distinct from the specific meaning in biology; it refers to the transcription of an infinite-dimensional optimal control problem into a largescale nonlinear programming (NLP) problem via discretization (Betts 2010). It may also be classified as a special case of the all-at-once (AAO) multidisciplinary design optimization (MDO) formulation (Cramer et al. 1994;Allison and Herber 2014). DT is applied here for the first time to enzyme network design for optimal dynamic performance. Consider the following general continuous dynamic problem: Equation (1a) defines the cost function for open-loop optimal control problems, where ξ (t) is the state vector and u(t) is the control input, both of which are functions of time t ∈ [t 0 , t F ]. Timeindependent parameters are represented by the vector p. Constraints (1b) and (1c) are the differential and algebraic equations that model the governing system dynamics. These two sets of constraints combined form a system of differential algebraic equations (DAEs) (Brenan, Campbell and Petzold 1996). The system may have initial conditions ξ (t 0 ) = ξ 0 , as well as a boundary condition at the final time t F (Equation 1d).
A conventional solution to this problem involves application of first-order optimality conditions-e.g. Pontryagin's maximum principle (PMP) (Pontryagin et al. 1962;Bryson and Ho 1975)-which produce a boundary value problem (BVP). In simple cases a closed-form solution of the BVP may be derived; in more general cases this BVP must be solved numerically. In addition, this solution strategy requires the derivation of system Hamiltonian derivatives, which is difficult in many cases.
A solution to the above problem produces an optimal trajectory u * (t) (open-loop optimal control solution), as well as p * (optimal time-invariant parameter values) and the corresponding state trajectories and final time. Feedback control can be implemented by replacing u(t) with control design variables x c that parameterize a feedback controller; x c can be optimized together with the time-independent parameters p: (DAE and boundary constraints have been omitted for brevity).
A closed-form solution exists for linear dynamic systems with quadratic objective functions. For nonlinear systems, a PMP-based approach may work, but if not other solution methods are required. One often-used solution method is a nested simulation approach, where the DAE (or ODE if no algebraic constraints exist) is solved using numerical simulation for every design alternative tested by the optimization algorithm. This nested simulation method is often referred to as single-shooting. Consider an optimal feedback control problem where t F and p are fixed, and no algebraic or boundary constraints are present: ( 3 ) Constraints are omitted in this formulation because they are satisfied implicitly via simulation. Here denotes discretized state variable trajectories, which are solved for using numerical simulation for every new candidate control design x c . This problem may be solved using standard NLP algorithms (Bertsekas 1999). While straightforward and intuitive, this approach suffers from several drawbacks. Simulation can lead to cost and constraint functions that are non-smooth or arithmetically inconsistent (Betts 2010). Numerical integration of DAEs may become very time-consuming for large-scale problems (Biegler 2007). In addition, single-shooting cannot handle open loop instabilities (Ascher and Petzold 1998;Flores-Tlacuahuac, Biegler and Saldívar-Guerra 2005), and solutions may be numerically unstable for highly nonlinear or stiff systems.
Multiple-shooting is a related method that addresses instabilities by dividing the simulation into n T smaller time segments. The system is simulated within each of these segments independently, and defect constraints are included in the NLP formulation to ensure consistency and state continuity between segments at optimization convergence. If the size of time segments is reduced to the limiting value of the individual time step sizes (i.e. one time step per time segment), the result is the direct transcription method. No forward simulation is performed. Instead, defect constraint satisfaction ensures approximate satisfaction of the underlying DAEs. More specifically, the time domain is divided into n t segments or intervals t 0 < t 1 < · · · < t n t = t F , and both discretized state ( ) and control (U) trajectories are optimization variables in the DT formulation. Time invariant parameters p may also be optimization variables.
The exact form of the defect constraints depends on the collocation method used to discretize the system equations. For example, the first-order Runge-Kutta method (forward Euler method) and higher-order Runge-Kutta methods, such as the implicit trapezoidal method, can be used to generate defect constraints: where h i is the step size at time t i . Higher-order collocation methods enable accurate solution with fewer time steps. This helps reduce problem dimension, but DT still involves large-scale NLPs. DT problems, however, often have well-behaved sparsity structures that support efficient solution through strategies such as sparse finite differences and exploiting parallel computations (Allison, Kokkolaras and Papalambros 2007;Betts 2010;Allison, Guo and Han 2014). In this article, a new DT solution method will be demonstrated for the enzyme network design problem.

Complementarity constraints
Complementarity constraints (CCs) are useful in many applications such as parameter estimation (Kameswaran, Staus and Biegler 2005), metabolic flux networks (Raghunathan 2004), truss topology design (Kočvara and Outrata 2006) and hybrid dynamic systems (Baumrucker and Biegler 2009). CCs are used here to manage enzyme network topology decisions. Complementarity refers to a relationship between variables where either one (or both) are at the boundary. The mathematical program with complementarity constraints takes the following form (Baumrucker, Renfro and Biegler 2007;Baumrucker and Biegler 2009): The complementarity constraints are given in the last line and imply that either x i = 0 or y i = 0, and x ≥ 0, y ≥ 0. While complementarity constraints (CCs) can be used to represent discrete decisions, the resulting nonlinear programming problems do not satisfy regularity requirements such as linear independence constraint qualification (LICQ) or even the weaker Mangasarian-Fromovitz constraint qualification (MFCQ). As a result, multipliers may be non-unique or unbounded (Baumrucker, Renfro and Biegler 2007;Izmailov, Solodov and Uskov 2012). Mathematical programs with complementarity constraints (MPCCs) are therefore difficult to solve using standard NLP solvers.
Several equivalent reformulations have been investigated and analysed using relaxed complementarity constraints (Anitescu, Tseng and Wright 2007;Baumrucker, Renfro and Biegler 2007;Baumrucker and Biegler 2009;Hu and Ralph 2004;Scholtes 2001;Ralph and Wright 2004;Leyffer, López-Calva and Nocedal 2006). One approach is to approximate CCs using a non-negative scalar t (Reg(t): x i y i ≤ t, x ≥ 0, y ≥ 0). Alternatively, the complementarity condition can be approximated by a single constraint (RegComp(t): x T y ≤ t, x ≥ 0, y ≥ 0) or replaced by equalities (RegEq(t): x i y i = t, x ≥ 0, y ≥ 0). In addition, an exact l 1 penalty function may be used; the complementarity constraint is enforced using a penalty term in the objective function (PF(ρ): min f (x,y,z) + ρx T y x ≥ 0, y ≥ 0 for ρ > 0). These MPCC reformulations exhibit different local convergence properties under mild conditions and may be solved using standard optimization algorithms.

Problem formulation
Consider a three-node enzyme network topology (Ma et al. 2009). Figure 1 illustrates one possible network topology where A receives I and node C produces the output. A third node B acts as a regulator. Figure 2 illustrates a step input change used in this study, and a representative output response. A network topology exhibits adaptation if its output returns to original levels after an input disturbance. Here it is desired to seek network designs with maximally adaptive behaviour, i.e. the error E between the steady-state output before and after the step input change is minimized. Equivalently, the precision P-the inverse of E-is to be maximized.
In addition, it is important to maintain an acceptable level of sensitivity (i.e. the largest relative output change compared to the relative input change) (Ma et al. 2009). Here the sensitivity is required to be greater than or equal to one. The resulting optimization formulation, with normalized error and sensitivity metrics, is where x represents a combination of topological and continuous design variables.

Dynamic system model
As discussed by Ma et al. (2009), the dynamic output response O(t) of the enzyme network is obtained via numerical simulation to solve the Michaelis-Menten equations that correspond to the candidate  network design (Michaelis and Menten 1913;Alon 2007). Within each of this model's nodes, basally available enzymes or other active network enzymes can oversee the conversion of a normalized enzyme concentration between active and inactive forms. For example, in Figure 1, T and 1−T represent the active and inactive levels at node T, for T ∈ {A, B, C}, respectively. 2 The arrow connecting node B to node C represents a positive regulation of node C by node B. This means that active level B (at node B) will convert the enzyme levels at node C from its inactive level 1−C to active level C. In other words, the active level B (at node B) will increase the production of the active level C (at node C). A negative regulation is also possible. In Figure 1, the connection between nodes C and B with a circle at node B represents a negative regulation (i.e. C −•B). This means that the active level C (at node C) catalyzes the transition of active form B to inactive form 1−B at node B. Figure 3 illustrates these relations between nodes B and C in greater detail. In Figure 3(a), C and 1−C represent the levels of node C's active and inactive forms, respectively. The arc with the symbol B above it indicates that the active form level B (produced by node B) causes the transition from its inactive form level 1−C to active form level C at node C. The differential equation that models this transition is given at the bottom of Figure 3(a). Here k BC is the catalytic rate constant, and K BC is the Michaelis-Menten constant. Figure 3(b) also outlines the negative regulation of C on B.  Often the basal (minimum) enzyme production level is non-zero. If a node has only positive (negative) incoming regulations, it is assumed that a basal enzyme level would deactivate (activate) this node. The notation E i and F i is used to represent activating and deactivating basal enzymes receptively, where i ∈ {A, B, C}. The basal enzyme in this model is set to be a constant value (0.5), and the corresponding rate equations are illustrated in Figure 4.
An additional regulation type is the self-loop regulation (e.g. the positive self-loop for node B in Figure 1). In this study it is assumed for simplicity that self-loops do not occur. The multiple regulations on a node are additive, resulting in the complete rate equation for node T that can be written as where T ∈ {A, B, C}, and X i is the activating enzyme (positive regulator) of node T, and belongs to the set {A, B, C, E B , E C }. Y i is the deactivating enzyme (negative regulator) of node T, and is a member of the set {A, B, C, F A , F B , F C }. The particular values of the catalytic rate constants (k cat 's) and the Michaelis-Menten constants (K m 's) of the enzymes have the ranges 0 ≤ k ij ≤ 10 and 0 ≤ K ij ≤ 100, where i and j refer to particular nodes and basal activating/deactivating enzymes.

MPCC formulation
MPCCs have been applied to many engineering applications including contact mechanics problems, structural design problems, nonlinear obstacle problems, elastohydrodynamic lubrication problems, and traffic equilibrium problems (Ferris and Pang 1997). For instance, in topology optimization, unilateral and frictional contact problems can be formulated as CCs (Panagiotopoulos and Al-Fahed 1994;Stavroulakis 1995;Rodrigues 1993). Kočvara and Outrata (2006) presented a new formulation for truss topology using MPCC and derived optimality conditions for the problem. CCs are used here to model the existence regulations between different nodes in the enzyme network design problem (i.e. network topology). CCs are a natural problem formulation as the link between two nodes may only have positive, negative or no regulation. To clarify, node C cannot exert both positive and negative regulation on node B in Figure 1. This requirement can be formulated as a complementarity constraint. For example, such a constraint for the three-node network would be k ij k ij = 0, ∀{i, j} ∈ {A, B, C} × {A, B, C}. Figure 5 illustrates how topology decisions may be modelled using CCs. At most one of k BC and k BC can exist between nodes B and C. This is enforced by the constraint k BC k BC = 0. Likewise, the constraints k BC k E C C = 0 and k F C C k BC = 0 ensure that a deactivating or activating enzyme is available when there are only positive or negative linkages coming into node C. Because the network depends on values of parameters k and K, the states and the parameters here are considered as optimization variables in the model. If a parameter is zero, its corresponding regulation does not exist; if it is nonzero, the corresponding regulation does exist. This strategy is essentially a ground structure method where possible nodes and interactions are defined a priori. Ground structure methods are used widely in structural design problems in topology optimization (Bendsøe and Sigmund 2003;Khetan, Lohan and Allison 2015). The MPCC formulation for the dynamic three-node enzyme network design problem is where k 1 and k 2 are defined for convenience for use with the CCs, and k and K are the design parameters to be optimized. The derivative functions f A (A), f B (B) and f C (C) characterize the dynamic interactions using Equation (8) between the nodes. More details regarding the formulation can be found in the supplemental materials for this article. The differential equations were formed into defect constraints ζ (·) using the trapezoidal collocation method. The complementarity constraints are managed here using a penalty term with weight ρ (Baumrucker, Renfro and Biegler 2007;Baumrucker and Biegler 2009), and the resulting DT-MPCC formulation is min ,k,KÊ 0 ≤ k 1 , k 2 , k ≤ 10, 0 ≤ K ≤ 100, for i = 1, 2, . . . , n t − 1, n t .

Numerical results
The simultaneous direct transcription (DT) approach was applied to both three-and four-node enzyme network design problems. The successful solution of the four-node problem is an important development; the four-node problem is intractable via enumerative solution strategies due to the increased dimension of the topological search space. The four-node design space contains more than 4.3 × 10 7 unique design topologies, compared to the three-node design space investigated by Ma et al. (2009) that includes 16,038 possible topologies.
In this section, only the optimal topological designs and corresponding system responses are presented. Detailed formulations and optimal continuous parameters can be found in the supplemental materials for this article.
More results regarding steady-state behaviour and sensitivity analyses are also included in the supplemental materials. Complete Matlab code for this study can be found at the URL listed in the bibliography under Allison (2016). Matlab package 'Tools for Network Analysis', created by the MIT Strategic Engineering Research Group (SERG), was used for network analysis elements of the numerical studies presented here (de Weck 2015).

Three-node enzyme network design problem
Problem (10) was solved in two different ways for a three-node network, both using the optimization algorithm KNITRO (Byrd, Nocedal and Waltz 2006). First, the simultaneous DT approach was used where the optimization problem and simulation problem were solved at the same time. The second approach utilized single-shooting, where the state variables were solved for each candidate design [k, K] using a multistep, implicit method for forward simulation. 3 This second method is a nested simulation strategy, since simulation is nested within the optimization problem.
A multi-start method was used in each case to improve the probability of locating a global optimum. The simultaneous DT approach exhibited significantly better computational efficiency and robustness than the single-shooting method. 20.83% of starting points using the simultaneous approach produced topologies with P * ≥ 1, where P * is the precision, 4 while only 0.83% did so when using the single-shooting method. The best result of the simultaneous DT approach (P * = 1.6072, S = 1.016) had better precision than the best feasible solution obtained using single-shooting (P * = 1.4584, S = 1.7963). Observe that the sensitivity constraint in the single-shooting solution is inactive (i.e. > 1), indicating an opportunity for improvement that the single-shooting method was unable to identify. Previous studies have identified advantages to traversing state and design spaces simultaneously (Allison, Kokkolaras and Papalambros 2007;Allison, Guo and Han 2014). A more direct path to the solution can be traced, and often better solutions can be identified. Figures 6 and 7 illustrate the network topologies and system responses produced by both solution approaches. Observe that the system is simulated long enough (40 s) for the output to achieve steadystate conditions before the input is perturbed. 5 The peak occurs approximately 5 s after the input disturbance. Both responses shown here exhibit adaptation, i.e. outputs return to near their original  values. The relative height of the output peak after t = 40 s corresponds to the sensitivity metric. A higher peak indicates that a change in input is easier to detect.

Four-node enzyme network design problem
In this section the investigation is extended to four-node enzyme networks as a step towards larger systems. Notice that 36 time-independent parameters must be optimized in the three-node case. The number of optimization variables increases to 64 for the four-node case. Here it is assumed that node A receives the input signal and node D produces the network output.
As part of this study the DT-MPCC solution method is compared to nested discrete-continuous optimization, a more conventional solution strategy. This is distinct from the nested simulation, or single-shooting, method described above. Nested optimization is described briefly here before presenting numerical results. Nested optimization involves the nested solution of outer-and inner-loop optimization problems. The outer-loop problem seeks to identify the optimal topology, and for each candidate topology tested by the outer-loop problem an inner-loop problem is solved to determine the best possible performance by optimizing continuous parameter values. Using the best performance as determined by the inner loop enables a fair comparison between candidate topologies.
Several solution strategies for the outer-loop integer program were considered. It is a general nonlinear program. The objective function is a black-box function (i.e. the result of the inner-loop problem) that is evaluated for every candidate topology. Branch and bound is an algorithm for general integer programs, but was discovered to be too computationally expensive for practical solution of this problem. A genetic algorithm (GA) was tested as well for solving the outer-loop problem 6 (Deb 2000;Deep et al. 2009). While optimality cannot be proven when using a GA, it is a practical solution strategy that improves the probability of identifying a global optimal solution in a reasonable time period for difficult problems that are resistant to more efficient integer programming solution methods. Figure 8 illustrates the optimal topology and corresponding system response for the nested optimization method. The optimal topology results in the scaled precision P * = 3.3678 and the active sensitivity constraint S = 1.0158. The nested optimization implementation was difficult to solve and involved significant computational expense compared to the DT-MPCC method (144,000 versus 2,500 simulations). Several factors contribute to this difficulty. The discrete variables in the nested strategy cannot be relaxed to continuous variables and have no special structure, so must be solved using a general nonlinear discrete algorithm. In addition, the inner loop must be solved for each topology candidate, adding to the expense. Figure 9 shows the optimal topology and corresponding system response using the simultaneous DT-MPCC approach. The optimal-scaled precision in this case is P * = 0.9568, and the sensitivity constraint is active (S = 1.0048). Figure 9(b) shows that the steady-states for the system prior to and after the input disturbance have changed less significantly, indicating that a more robust topology is obtained from the simultaneous DT-MPCC approach.

Steady-state concentrations and sensitivity analyses
For dynamic systems, one can explore the steady-state operating points in a variety of ways. Steadystate behaviour of the optimal three-node and four-node topologies was found both through simulation with sufficient settling time, and also by setting the differential equations to zero and solving for equilibrium points. A multi-start solution method was used to test for multiple equilibrium points. The only physically meaningful equilibrium points found, [0.5948, 0.5251, 0.2287] (threenode, Figure 7(a)) and [0.5857, 0.5560, 0.5491, 0.2538] (four-node, Figure 9(a)), are consistent with the steady-state values obtained via simulation. These tests demonstrate that the simultaneous DT approach obtains correct steady-state values at optimization convergence. Detailed results of this equilibrium study are presented in the supplemental materials for this article. Local sensitivity analysis for the parameters k ij and K ij was also performed for the same two enzyme networks used for equilibrium analysis. The goal of local sensitivity analysis is to understand how system output responses vary with small changes to parameters and state variables. The reader is referred to Cacuci (2003) and Cacuci, Ionescu-Bujor and Navon (2005) for relevant theory and local methods such as adjoint modelling. Note that this sensitivity analysis is different from the sensitivity function in the optimization formulation that quantifies the ability to detect a change in input from I = 0.5 to I = 0.6. The R Package FME was used to implement local sensitivity analysis (Soetaert and Petzoldt 2010). As with the equilibrium analysis, the three-node enzyme network of Figure 7(a) and the four-node enzyme network of Figure 9(a) were studied. The sensitivity analysis showed that the incoherent feed-forward loop motif plays a key role in maintaining system stability. In addition, feedback loops were also shown to be important, and the catalytic rate constants k ij had great influence on the system behaviour. Detailed sensitivity analysis findings are presented in the supplemental materials for this article.

Conclusion
In this article a dynamic optimization technique based on direct transcription (DT) and mathematical programming with complementarity constraints (MPCCs) was introduced for the simultaneous solution of the enzyme network design problem. DT and CCs were used in tandem to optimize simultaneously both topology and continuous dynamic network design variables. DT was used to transform the ordinary differential equations into algebraic defect constraints, and the MPCC formulation provided a means for determining optimal network topologies. The case study concentrated on enzyme network topologies that could produce adaptive behaviour. Studies were restricted to deterministic models based on the Michaelis-Menten differential equations. The investigation centred on two case studies: three-and four-node enzyme network design problems. In the three-node enzyme network design problem, two solution strategies were tested: direct transcription (simultaneous analysis and design) and single-shooting (simulation performed for each design candidate). The simultaneous DT approach produced better results than single-shooting for the three-node enzyme regulatory network design problem, and was found to be highly efficient. The simultaneous DT optimization approach was then extended to the design of a four-node design problem, which is a complex enough design problem to be intractable in practice via enumerative solution strategies. A conventional nested optimization strategy was also investigated for the four-node problem for comparison. In the nested optimization strategy, an outer loop solves the discrete topology optimization problem, and a continous inner loop problem is solved to obtain the best dynamic network performance for each topology candidate tested by the outer-loop problem. Both the nested optimization and simultaneous DT-MPCC strategies were demonstrated to be effective solutions methods with potential to scale up to more complicated problems than can be solved via enumeration. The latter strategy was computationally less expensive, whereas the former produced a somewhat better performing network design.
The solutions obtained here have verified the importance of two core motifs: the feedback and the incoherent feed-forward loops that were identified by Ma et al. (2009). This result confirms that optimization-based methods are useful for identifying important topologies in the enzyme networks with reduced computational expense, as well as for supporting the effective design of larger networks. The equilibrium and sensitivity studies provided further insights into resulting optimal network designs. The proposed simultaneous DT approach identifies both the optimal topology and steady-state concentrations. The sensitivity analyses also revealed that the catalytic constants k ij have significant impact on system performance.
While the DT-MPCC method is an important improvement in design methodologies for dynamic network design problems, it so far has resulted only in an incremental increase in the size of networks that can be designed. Realizing the potentially transformative impacts of synthetic biology will require the design of much larger multi-function enzyme regulatory networks with complex interactions. As a step toward the design of larger-scale synthetic biological systems, ongoing work is testing a new design strategy based on generative design algorithms (GDAs). GDAs are a design abstraction that allow the indirect representation of highly complex, variable-dimension designs using a small (fixed) number of parameters by leveraging emergent properties. Another important topic for future work is to investigate design methods that can account for the significant levels of uncertainty in synthetic biology. A large body of work exists in robust design and design under uncertainty, and these existing methods and theory should be studied in the context of advancing synthetic biology.

Notes
1. For example, application of control theory led initially to significant breakthroughs by leveraging its rich heritage, but natural biochemical networks are fundamentally more complex than existing engineering control systems. Continued progress in design strategies based on control theory will certainly lead to important new insights and refinement of synthetic biological systems with current levels of complexity, but is unlikely to support successful creation of systems with new levels of complexity. 2. The notation here follows that used by Ma et al. (2009). 3. Matlab ode15s/ode45 solver. 4. Here the precision reported is scaled as log (E −1 ). 5. An alternative strategy for finding steady-state conditions before perturbation is to solveξ = f d (ξ (t), p) = 0 for ξ . See Section 3.3 for results of this additional equilibrium analysis. 6. Matlab Global Optimization Toolbox, genetic algorithms for mixed integer optimization.

Disclosure statement
No potential conflict of interest was reported by the authors.