Data-driven construction of Convex Region Surrogate models

With the increasing trend of solving more complex and integrated optimization problems, there is a need for developing process models that are sufficiently accurate as well as computationally efficient. In this work, we develop an algorithm for the data-driven construction of a type of surrogate model that can be formulated as a set of mixed-integer linear constraints, yet still provide good approximations of nonlinearities and nonconvexities. In such a surrogate model, which we refer to as Convex Region Surrogate (CRS), the feasible region is given by the union of convex regions in the form of polytopes, and for each region, the corresponding cost function can be approximated by a linear function. The general problem is as follows: given a set of data points in the parameter space and a scalar cost value associated with each data point, find a CRS model that approximates the feasible region and cost function indicated by the given data points. We present a two-phase algorithm to solve this problem and demonstrate its effectiveness with an extensive computational study as well as a real-world case study.


Introduction
This paper deals with the following problem: given a set of data points in the K-dimensional parameter space and scalar cost values associated with each data point, find a set of convex regions in the K-dimensional space such that the union of the convex regions describes a tight envelope around all data points, and for the data in each region, find a linear cost correlation within a prespecified error tolerance.This is a fairly generic problem and in our proposed algorithm, we apply concepts from various areas, such as computational geometry, mixed-integer programming, polyhedral theory, metamodeling (Simpson et al. 2001), clustering (Jain 2010), and pattern recognition (Jain et al. 2000).However, to the best of our knowledge, the problem as such has not been reported before in the literature.
This work is motivated by the high demand for computationally efficient process models in many engineering applications.Typically, when constructing a mathematical model, we have to trade off model accuracy against computational tractability, especially if we want to use the model for optimization purposes.If the computational budget forbids the use of a detailed process model, we have to create a surrogate model that can be used instead.As the trend goes towards integrated optimization involving multiple levels of decision-making, e.g. in dynamic realtime optimization or planning and scheduling, there is an increasing demand for high-quality surrogate models that are computationally efficient and at the same time ensure a sufficient degree of accuracy (Chung et al. 2011).
Surrogate modeling techniques can be divided into two general categories: model order reduction and data-driven modeling (Biegler et al. 2014).In model order reduction, one starts with a high-fidelity model and tries to reduce the complexity of that model while retaining most of the structure of the original equations.Datadriven modeling, on the other hand, solely relies on data and does not make use of the explicit formulation of an existing model.Data-driven modeling approaches have become popular in recent years because of their wide applicability (Queipo et al. 2005;Wang and Shan 2007;Cozad et al. 2014) and are used in one of the following three situations: (1) The existing model is too complex to be reduced to the desired degree by using model order reduction techniques.(2) There exists a ''black box'' model that can be used for simulation, but the model formulation is not given in explicit form.(3) A mathematical model of the process does not exist, yet we can draw data from the actual process.
A major challenge in surrogate modeling is the approximation of nonlinearities and nonconvexities, in which case most existing methods have to apply nonlinear structures in order to achieve good accuracy.However, in mixed-integer programming, there is typically a huge difference in computational complexity between a linear and a nonlinear formulation.Thus, we want to avoid nonlinearities when constructing surrogate models for mixed-integer programming frameworks, which frequently arise in various engineering applications, such as process design, planning and scheduling, and supply chain optimization.Ierapetritou (2001) applies the concept of flexibility analysis (Swaney and Grossmann 1985) and uses the Quickhull algorithm (Barber et al. 1996) to find a convex hull as approximation of the feasible region of a given model.The constructed convex hull is guaranteed to be inscribed in the actual feasible region, however, only if the feasible region is convex.To evaluate the feasibility of nonconvex processes, Goyal and Ierapetritou (2003) develop an approach in which the feasible region is approximated by subtracting outer polytopes around the infeasible regions from an expanded convex hull obtained by simplicial approximation (Goyal and Ierapetritou 2002).Sung and Maravelias (2007) propose an attainable region approach that determines the feasible production levels of the underlying state-task production scheduling model and an underestimation of the production cost in an integrated planning and scheduling framework.In a subsequent work (Sung and Maravelias 2009), the same authors develop an extension of the attainable region approach that takes nonconvexities into account by combining multiple polytopes.It should be noted that these methods become computationally expensive in cases of higher dimensions and with nonconvexities.
The above approaches all require the full mathematical formulation of the model that we want to approximate.Despite the general applicability of data-driven methods, the number of related works taking an approach similar to ours is very limited.There is a large body of work on data-driven techniques in the machine learning (Mitchell 1997;Hastie et al. 2009) community.However, most loosely related existing machine learning methods, such as support vector machines and kmeans clustering, are designed for classification or pattern recognition purposes and cannot be directly applied to solve our given problem.Several works (U ¨ney and Tu ¨rkay 2006; Xu and Papageorgiou 2009;Kone and Karwan 2011) have proposed a mixed-integer programming approach to multi-class data classification in which hyperboxes are used to define the regions for the different classes.Karwan and Keblis (2007) use the convex hull around given data points as an approximation of the feasible region and apply linear regression to find a linear surrogate cost correlation.This method is easy to implement and has been successfully applied in other works such as Mitra et al. (2012).However, the resulting surrogate model is only accurate if the true feasible region is convex and the true cost function is linear.
In this work, we propose to approximate the feasible region with the union of convex regions in the form of polytopes.Furthermore, for each region, the corresponding cost function is approximated by a linear function.We show that such a surrogate model, which we refer to as Convex Region Surrogate (CRS), can be formulated as a set of mixed-integer linear constraints.Thus, CRS models are ideal for problems that are already formulated as mixed-integer linear programs (MILPs) since in that case, including a CRS model in the formulation of the optimization problem will not increase its structural complexity.Such CRS models have already been successfully incorporated into MILP scheduling formulations and found application in industrial settings (Zhang et al. 2015a, b).
The main contribution of this work is the development of an algorithm for the datadriven construction of CRS models.The proposed algorithm takes data drawn from the process (or simulations), and constructs convex regions such that the union of the convex regions describes a tight envelope around all data points and a sufficiently accurate linear approximation of the cost function can be found for each region.Also, as will be shown, this is a challenging problem that may require significant computational expense.This paper is organized as follows.In Sect.2, we explain the concept of CRS models and their advantages when embedded in MILP formulations.In Sect.3, we formally state the problem of constructing CRS models.Section 4 shows a brief illustrative overview of the proposed algorithm which is divided into two phases, which are described in detail in Sects.5 and 6.After summarizing the complete algorithm in Sect.7, we present the results from an extensive computational study in Sect.8 and apply the proposed method to a real-world industrial case study in Sect.9.In Sect. 10, we comment on some limiting features of the proposed algorithm and directions for future work.Finally, in Sect.11, we close with a summary of this work and some concluding remarks.

Convex Region Surrogate
To motivate the construction of CRS models, let us consider the following mixedinteger program: where u are continuous variables and w are integer variables.Each variable vector x i is bounded by a set X i ; and f i ðx i Þ is the corresponding cost function.Both X i and f i can be nonlinear and nonconvex.a and b are cost coefficients.
From the structure of the problem in (1), we can see that the major difficulty in terms of computational complexity will arise from the nonlinearities in X i and f i ; especially if the set I is large.A typical example is a discrete-time scheduling formulation, in which I is the set of time periods.x i could represent the amounts of products produced in time period i, and f i ðx i Þ could be the corresponding cost.x i is bounded by X i ; which contains all constraints related to the production process.Equation (1b) would represent constraints linking the different time periods.Since the structures of X i and f i have a large impact on the computational tractability of the problem, it is desirable to replace them by computationally more efficient approximations without losing much accuracy in the model.
The concept of the CRS is inspired by Karwan and Keblis (2007) who propose to approximate the feasible region X with the convex hull around all given data points, and apply linear regression to the data to find a linear approximation of the cost correlation f.The advantage of this approach is that the model remains linear and convex.Also, one can easily reduce the dimension of the feasible region by only considering relevant variables, which is especially useful in multiscale optimization.For example, in production scheduling, we often only need to know how much the plant can produce.Thus, instead of considering all variables including temperatures, pressures and intermediate flows, we would only include the production rates in the surrogate model.
The approach by Karwan and Keblis (2007) has two major limitations.First, if the true cost correlation is nonlinear, linear regression over all given data points will be inaccurate.Second, if the true feasible region is nonconvex, the convex hull around all data points will provide a poor approximation.As an example, suppose we want to approximate the shaded nonconvex feasible region shown in Fig. 1a and we are given the data points shown in the same figure.If we just take the convex hull, we obtain the approximation shown in Fig. 1b, which is not a very good approximation.
Instead of using the convex hull, we propose to approximate the feasible region with the union of convex regions in the form of polytopes.In this way, as illustrated in Fig. 1c, we can obtain a considerably more accurate representation of the feasible region.In addition, if the true cost function is not linear, we can construct the regions such that a sufficiently accurate linear approximation of the cost correlation can be found for the data of each region.In this way, we obtain a piecewise linear approximation of the cost function.
Since each region is a polytope, any point in a region can be expressed as a convex combination of its vertices.A feasible point has to be in one of the regions.Furthermore, the form of the cost function depends on the region in which the feasible point lies.As a result, the CRS model can be naturally described with the disjunction (Grossmann and Trespalacios 2013): where R is the set of convex regions and V r is the set of vertices of region r.In each region r, x is described as the convex combination of vertices v rj with j 2 V r : The nonnegative multipliers, which have to sum up to 1 over j 2 V r ; are denoted by k j : b r and c r are the cost constant and coefficients for the cost correlation in region r.Y r is a boolean variable and takes the value true if the chosen feasible point lies in region r.
By applying the hull reformulation (Grossmann and Trespalacios 2013), the disjunction can be transformed into the following set of mixed-integer linear constraints: where x and k j are disaggregated into the region-dependent variables x r and k rj ; respectively.
x r and k rj can only be nonzero if the region r is chosen.The binary variables y r have to sum up to 1 over r 2 R; i.e.only one region can be chosen.In the cost function, the constants and coefficients also only apply if x lies in the corresponding region.
By replacing X i and f i in (1) by such CRS models, we arrive at the following formulation: which is an MILP since the nonlinearities have been replaced by linear constraints.In summary, the main advantages of CRS models are the following: • Can approximate nonlinear and discontinuous cost functions.
• Can approximate nonlinear and nonconvex feasible regions.
• By formulating CRS models as a set of mixed-integer linear constraints, efficient MILP solvers such as CPLEX and GUROBI can be used.This is especially useful if the optimization problem, in which we want to integrate the CRS models, is already an MILP.• The dimension of the variable space in a CRS model can be kept small by only considering data in a subspace, defined by the needs of the particular application or determined by some feature selection mechanism.
Remark 1 Instead of expressing the feasible region of a CRS model as the union of feasible convex regions, we can also formulate it as the difference of the convex hull and the union of infeasible convex regions.This is shown in Appendix 1.This alternative formulation usually gives rise to a weaker MILP formulation; however, it may be the better choice if the number of infeasible convex regions is significantly smaller than the number of feasible convex regions.In the following sections, it will become clear that we can skip the last steps of the algorithm, namely the ones related to the convex region assignment, if we only intend to use the alternative formulation.
In order to formulate a CRS model, we need to determine the set of regions R, the corresponding vertices, as well as the region-dependent cost constants and coefficients.Finding these components is a nontrivial task for which we present an algorithm in the remainder of this paper.

Formal problem statement
After we demonstrated in the previous section how to formulate a CRS model as a set of mixed-integer linear constraints given a disjunctive set of convex regions, we take on the task of constructing these convex regions and their cost approximations from data.This problem is formally stated as follows.
Given n data points where each data point j consists of a K-dimensional parameter vector a j 2 R K and a scalar cost value g j ; find convex regions r in the form of polytopes in the parameter space such that • the union of the convex regions contains all a j ; • no a j lies in the interior of two or more convex regions, i.e. the convex regions do not overlap except for possibly at the boundaries, • in each region r, there exists a linear correlation (with a maximum error ) between the parameter and the cost values of the data points contained in region r, i.e.
where J r is the set of data points contained in region r, • the union of the convex regions represents a tight envelope for the feasible region indicated by the given data points.

Illustrative overview of algorithm
To obtain the CRS, we propose a two-phase algorithm.In Phase 1, the set of all data points is divided into subsets such that a linear parameter-cost correlation can be obtained within a tolerance for all points in each subset, and that the convex hulls constructed around the points of each subset do not overlap.In Phase 2, for each subset, the feasible region indicated by the data points assigned to the subset is approximated by constructing multiple convex regions of which the union contains all points of the subset.
In the next two sections, we present the Phase 1 and Phase 2 algorithms in detail.In order to facilitate the understanding of the algorithm, the explanation of each step is accompanied by an illustrative two-dimensional example.The data points for this example are shown in Fig. 2a, in which the different markers indicate different parameter-cost correlations.The complete set of data for this illustrative example can be found in Appendix 2.
Figures 2b and 2c show the results at the end of Phase 1 and Phase 2, respectively.In Phase 1, the data points are assigned into three disjoint subsets.Note that we obtain three subsets although there are only two different parameter-cost correlations because with two subsets, the resulting two convex hulls would overlap.In Phase 2, we check for each subset whether it needs to be further partitioned such that multiple convex regions can be obtained to construct a tighter envelope of the feasible region.Here, this is only necessary for the second subset, for which three convex regions are constructed (cf.Fig. 2c).
In Phase 1, we assign the given data points to subsets such that linear parametercost correlations can be obtained for each subset and that the convex hulls of the subsets do not overlap.A flowchart for the Phase 1 algorithm is shown in Fig. 3. First, we set the initial number of subsets m, typically to 1.We then try to assign the data points to m subsets such that linear parameter-cost correlations can be obtained within the tolerance for each subset.We increase m until the assignment problem is feasible.In the next step, the vertices of the convex hulls for each subset are obtained.Using these, we can then check whether the convex hulls overlap each other.If they do, we add cuts to the subset assignment problem in order to avoid obtaining the same solution in the next iteration.The algorithm terminates when the assignment solution provides m subsets such that the corresponding convex hulls are disjoint.

Subset assignment formulation
At the heart of the Phase 1 algorithm is the assignment problem, which assigns n data points to m subsets.For the set of subsets at iteration t, I t ¼ 1; 2; . ..; m f g ; we find a feasible solution to the following set of mixed-integer linear constraints: where the binary variable y ij ¼ 1 if point j is assigned to subset i.In this formulation as well as throughout the rest of this paper, M denotes a positive big-M parameter.Constraints (6a) and (6b) ensure that every point is assigned to a subset, and that every subset contains at least one point.Equation (6c) is a symmetry-breaking constraint that enforces point-to-subset assignment in lexicographic order.Equation (6d) describes the linear cost correlation for each subset i applied to all points j, from which the fitting error ij is calculated in Eq. (6e). is a prespecified error tolerance, and constraint (6f) forces ij to be bounded by À and if point j is assigned to subset i.Otherwise, the constraint is relaxed.
In general, there are multiple feasible solutions to (6), which may lead to the algorithm requiring a large number of iterations if many of the possible solutions do not result in non-overlapping convex hulls.For our illustrative example, Fig. 4 shows two possible solutions for m ¼ 3 in which the resulting convex hulls overlap.Notice that such solution would be discouraged if we minimized the distances between the points in the same subset.To achieve this clustering effect, we propose the following alternative MILP formulation: where d ik denotes the maximum distance in the kth dimension between two points in subset i, and z ijj 0 ¼ 1 if both points j and j 0 are assigned to subset i. Constraints (7c)-(7e) ensure that z ijj 0 ¼ 1 if and only if y ij ¼ y ij 0 ¼ 1: The maximum distances between two points in each subset are determined by including constraint (7f) and minimizing P i2I t P k d ik : This formulation facilitates-but does not guaranteesubset assignment that results in non-overlapping convex hulls.In fact, in the case Fig. 4 For the illustrative example, the constraint set given by ( 6) with m ¼ 3 has many solutions which lead to overlapping convex hulls Data-driven construction of Convex Region Surrogate models 299 of the illustrative example, it provides the desired solution shown in Fig. 2b in one iteration.However, ( 7) is a significantly larger problem than (6).Therefore, with a large number of data points, it may be computationally more efficient to use (6).

Identifying overlapping convex hulls
After constructing the convex hulls, for which we apply the Quickhull algorithm (Barber et al. 1996), we have to check if they overlap.If two convex hulls overlap, we can find a point that belongs to both convex hulls.Otherwise, we cannot.We can determine the existence of such a point for each pair of convex hulls i and i 0 by checking the feasibility of the following set of equations: where V i and V i 0 are the sets of vertices of convex hulls i and i 0 ; respectively.Clearly, point p is constrained to be inside both convex hulls i and i 0 : Thus, ( 8) is only feasible if such a point exists, i.e. if convex hulls i and i 0 overlap.Instead of solving (8) for each pair of convex hulls, we can also detect all overlapping convex hulls by solving one single linear program (LP): Here we introduce slack variables s þ ii 0 k and s À ii 0 k for each pair of convex hulls i and i 0 and each dimension k.Similarly as point p in (8), point p ii 0 is constrained to be inside convex hull i.However, according to Eq. (9c), point p ii 0 is only also inside convex hull i 0 if the corresponding slack variables are zero.LP (9) minimizes the sum of all slack variables corresponding to pairs of convex hulls.If a common point can be found, the slacks become zero at the optimal solution and we know that convex hulls i and i 0 overlap.If they do not overlap, the LP solution yields [ 0 since the point p ii 0 cannot be obtained as a convex combination of vertices of the convex hulls i and i 0 : The choice of whether to solve (8) for jI t j 2 pairs of convex hulls or the single LP (9) depends on the overhead of solving small LPs in sequence or in parallel and then collecting the results.Often, the overhead time is larger than the benefit we would achieve from solving several small LPs instead of the one large LP.

Overlap elimination cuts
The cuts are constraints that are added to the subset assignment problem (6) in order to eliminate previous solutions that have led to overlapping convex hulls.In fact, we add cuts that cut off more than just one assignment solution at each iteration.The idea is that if we know from solving (9) that convex hulls i and i 0 overlap, we construct cuts that eliminate all assignment solutions that involve two subsets such that one of the two subsets contains the vertices of convex hull i and the other subset contains the vertices of convex hull i 0 : We illustrate the proposed cuts with an example in Fig. 5. Suppose we detected the two overlapping convex hulls, say i and i 0 ; in Fig. 5a.In any assignment solution, in which the vertices of convex hull i are assigned to one subset and the vertices of convex hull i 0 are assigned to another subset, the convex hulls resulting from these subsets will contain convex hulls i and i 0 ; and will therefore also overlap each other.This is true regardless which other points are assigned to the subsets.An Fig. 5 The added cuts eliminate assignment solutions guaranteed to result in overlapping convex hulls based on information from the current solution.a Pair of overlapping convex hulls, cuts will eliminate this assignment solution, b cuts will also eliminate, for example, this assignment solution Data-driven construction of Convex Region Surrogate models example is shown in Fig. 5b where the convex hulls i and i 0 (dashed lines) are inscribed in the new convex hulls.
The cuts can be formulated as follows.The cuts are accumulated in the set S, which is set to ; when the number of subsets m is increased, i.e. S is empty at the beginning of each iteration t.For each cut s 2 S; we define a set I s ¼ i; i 0 f g with i and i 0 being two convex hulls overlapping each other.The cuts in terms of the assignment variables y ij for a given point j in subset i are formulated as follows: where V is is the set of vertices of convex hull i for cut s and j V is j is the cardinality of set V is : In integer programming, ( 10) is generally known as cover cut (Crowder et al. 1983).Note that for each pair of overlapping convex hulls, we have to add m 2 cuts in order to account for all possible combinations of subset indices.For example, if we have three subsets, we have to add 3 2 ¼ 3 cuts for each pair of overlapping convex hulls, i.e. construct one cut for each of the three subset index pairs {1, 2}, {1, 3}, and {2, 3}.
6 Phase 2: construction of convex regions In Phase 2, we construct convex regions for each subset i such that the union of the convex regions results in an accurate approximation of the generally nonconvex feasible region.We assume that the feasible region associated with each subset forms a simply connected space, i.e. there are no ''holes'' in the feasible region.
In our illustrative example, the feasible regions for the first and third subsets are convex such that the convex hulls resulting from Phase 1 are already the tightest approximation (cf.Fig. 2b).Therefore, we will demonstrate the Phase 2 algorithm with the second subset of which the feasible region is clearly nonconvex.
Remark 2 In the following, we allow us a slight misuse of the term ''facet''.Strictly speaking, a facet is a feature of a polyhedron; a facet of a polyhedron of dimension K is a ðK À 1Þ-dimensional face.Here, we apply the term ''facet'' in the context of polygons, where it is used as a synonym for ''ðK À 1Þ-dimensional boundary hypersurface''.The main difference is that for a polyhedron, a facet defines a supporting hyperplane, which does not necessarily hold true in the case of a general polygon.
Phase 2 consists of two main steps as illustrated in Fig. 6.First, we detect the contour of the feasible region, i.e. we find its vertices and facets.In the second step, the resulting envelope for the feasible region is partitioned into polytopes such that the union of these polytopes forms the desired surrogate feasible region.
In the following, we will first give an overview of the Phase 2 algorithm and then describe each part in detail.Since Phase 2 can be performed on each subset independently, we omit the index i for the sake of readability.
The high-level flowchart for the Phase 2 algorithm is shown in Fig. 7.The algorithm is initialized with results obtained in Phase 1, namely the set of vertices V and the set of facets F of the convex hull.The initial F; which is the set of facets to be examined in the current iteration, is set to F. Furthermore, for each facet f, we define the set H f which contains the vertices of facet f.
As illustrated in Fig. 6, the contour of the feasible region can be approximated by facets which are defined by the corresponding vertices.At each iteration, we move into the current outer approximation of the feasible region as far as possible in order to find new vertices and construct new facets using those new vertices.The idea is to take each facet f 2 F and try to find a data point with which new facets can be formed without cutting off any data points.The point, which fulfills this condition and is sufficiently far away from the facet, is declared a new vertex and new facets are formed using this vertex.In this way, a tighter outer approximation of the feasible region is formed.At each iteration, if new facets have been created, the sets V, F, F; and H t f are updated before the algorithm moves to the next iteration.This process is repeated until no new vertices and facets can be found.Note that after new vertices are added to the initial set V, the set of vertices will not define a convex polytope but rather a nonconvex polygon.The algorithm for finding new vertices and creating new facets is described in detail in Sect.6.1.
After the contour of the feasible region is obtained, the algorithm uses the information about the vertices and facets to construct the desired convex regions.At the heart of this process is an assignment problem, which assigns facets, vertices and data points to convex regions.We elaborate on this part of Phase 2 in Sect.6.2.

Detecting the contour of the feasible region
At each iteration, for each facet f 2 F; the algorithm shown in Fig. 8 is applied to find one new vertex and create new facets by connecting the new vertex with vertices of facet f.We use two criteria to identify the new vertex.First, to obtain an outer approximation of the feasible region, the new facets created using the new vertex must not cut off any data points.Second, to move as far as possible into the F to find new vertices and create new facets current approximation of the feasible region, we look for the data point that is furthest away from facet f while satisfying the first condition.
In the proposed algorithm, we first obtain a unit-length vector that is perpendicular to the facet and points towards the interior of the feasible region.We then find the set of all candidate points C f ; where a candidate point is a data point which fulfills the first condition stated above.The normal vector is used in the next step to measure the distance to the facet so that we can determine the point in C f that is furthest away from the facet.If the distance d is greater than or equal a specified tolerance d; we declare this point a new vertex, create the corresponding new facets and then move on to the next facet.Otherwise, we directly go to the next facet f 2 F: Remark 3 In practice, we want to avoid creating new facets from facets that are already very small in size.Therefore, in our implementation, we determine the maximum Euclidean distance between two vertices of a facet and only further examine the facet if the distance is greater than a prespecified value.

Obtaining normal vectors
To find a normal vector for facet f, we first obtain an expression for the hyperplane which contains the facet.For this, we need K points that are on the hyperplane, for which we simply take the first K vertices of the facet.Suppose these points are a 1 ; a 2 ; . ..; a K ; then any point p on the hyperplane can be expressed as where b 0 f ¼ a 1 ; b fk 0 ¼ a k 0 þ1 À a 1 are the difference vectors.Starting at point b 0 f ; any point on the facet-containing hyperplane can be reached by varying the coefficients a k 0 since the hyperplane is spanned by b fk 0 ; a k 0 are unrestricted in sign and magnitude.
To obtain a vector that is perpendicular to the facet and has unit length, we simply need to find an n f 2 R K that is normal to all difference vectors, i.e. b T f n f ¼ 0; and satisfies kn f k 2 ¼ 1: However, we need the normal vector to point towards the interior of the feasible region.Since the solution for n f is not unique, we need to check in which direction n f points and change the sign if it points outwards the feasible region.
The procedure for determining the direction of n f is slightly different for t ¼ 1 and t [ 1: At the first iteration (t ¼ 1), the initial facets describe the convex hull around all data points of the subset.Hence, the facet-containing hyperplanes are supporting hyperplanes for the convex hull, i.e. all data points lie on one side of the hyperplane, as illustrated in Fig. 9.We choose any point p on the facet-containing hyperplane, e.g.b 0 f ; and a data point a that does not lie on the facet.If n f points towards the interior of the feasible region, the angle formed by n f and the difference vector ð a À pÞ will be less than 90°, which we can check by simply determining the sign of the inner product ð a À pÞ T n f : If ð a À pÞ T n f [ 0; n f points towards the interior of the feasible region and we set n f ¼ n f : Otherwise, n f points outwards and we set n f ¼ Àn f : The procedure described above only applies in the first iteration because the facet-containing hyperplanes in later iterations are not necessarily supporting hyperplanes for the feasible region.However, for t [ 1; we only have to consider the newly created facets, and we can exploit the fact that we know for each new facet from which ''old'' facet it originates.As illustrated in Fig. 10, the entire old facet f 0 lies on one side of the hyperplane containing the corresponding new facet f.The normal vector of the new facet is supposed to point towards the other side.In order to check the direction in which n f points, we find a point p f on the hyperplane containing facet f and a point p f 0 that lies on facet f 0 but not on facet f.If n f points into the feasible region, the angle between n f and ðp f 0 À p f Þ will be greater than 90°, i.e. ðp f 0 À p f Þ T n f \0: In this case, we set n f ¼ n f ; otherwise, we set n f ¼ Àn f :

Identifying candidate points
For each facet f, we identify candidate points that can potentially become a new vertex.A candidate point for facet f is defined as follows.
Definition 1 Consider a data point j that is not a vertex.Form a polytope with point j as well as the vertices of facet f being vertices of this polytope, which we denote PT fj : Point j is a candidate point for facet f if polytope PT fj is completely contained in the current approximation of the feasible region and does not contain any data points in its interior.In Fig. 11, a candidate point is illustrated for the top facet of our example.The polytope formed in a 2-dimensional case is a triangle.In Fig. 11a, the triangle formed by the facet and the chosen data point does not contain any points except for the vertices.The chosen data point is therefore a candidate point.In Fig. 11b, the triangle contains additional data points, but only on the boundary.The corresponding chosen point is therefore also a candidate point.In contrast, the chosen point in Fig. 11c is not a candidate point because the triangle contains a data point in its interior.
For each facet, we have to check for every data point j that is not a vertex, i.e. j 6 2 V; if this point is a candidate point.For this, we first solve the LP given in (12) for each point j 6 2 V: The LP can be feasible or infeasible.It is feasible if point j is on the correct side of the facet, i.e. it can be reached from the facet by moving in the direction of n f : Otherwise, the LP is infeasible.Furthermore, when feasible, the optimal solution of the LP will indicate if there are any data points lying in the interior of polytope PT fj : Equations ( 12b)-(12f) ensure that the point j, that is being checked, is on the side of the facet towards which the normal vector n f points.In fact, this formulation is slightly more restrictive as it states that point j has to be reached by moving from a point p in the direction of n f ; where p is a point on the facet rather than just a point on the facet-containing hyperplane.
Through Eqs. ( 12g)-( 12j), we check every data point j 0 that is not point j or a vertex of facet f, i.e. j 0 6 2 H t f ; j 0 6 ¼ j; to see if the data point lies in the interior of polytope PT fj : Equations ( 12g) and (12h) ensure that a j 0 is a convex combination of a j and a j 00 for j 00 2 H t f if the corresponding slack variables are zero.In that case, a j 0 is guaranteed to be in the interior of polytope PT fj because the multipliers are constrained to be greater than or equal to a small value in Eq. (12i).At the optimal solution of ( 12), in the interior of polytope PT fj ; in which case we know that point j is not a candidate point for facet f.However, in order to detect the candidate points, only solving ( 12) is not sufficient because we also have to consider the situation illustrated in the example shown in Fig. 12.Here, point F is the new vertex originating from facet A-B and one can see that the polytope formed by points A, B and F does not contain any data points in its interior.When we now move on to the next facet and examine facet B-C, we see that one can form a polytope with points B, C and E such that the polytope does not contain any data points in its interior.However, one can clearly see that the formed polytope B-C-E is not fully contained in the current approximation of the feasible region.Therefore, E cannot be a candidate point for facet B-C.
Essentially, we have to check if polytope PT fj intersects any of the polytopes that have been ''cut off'' in previous iterations.For instance, in the example in Fig. 12, we have to check if the polytopes B-C-E and A-B-F overlap each other.This can be achieved by solving the following LP: where Q is the set of cut-off polytopes and H q is the set of vertices of cut-off polytope q.By solving (13), we try to find a point p q for each q 2 Q such that p q is in the interior of polytope PT fj and is also in the interior of polytope q.If we can find such a point for at least one of the cut-off polytopes, point j cannot be a candidate point.
Constraints (13b)-(13d) force p q to be a point in the interior of polytope PT fj : Through Eqs.(13e)-(13h), p q will also be a point in the interior of polytope q if the corresponding slack variables are zero.Since we are minimizing the sum of all slack variables, P k ŝþ qk þ ŝÀ qk will be driven to zero if polytope PT fj and polytope q overlap.Consequently, if point j has not been ruled out as a potential candidate point after solving (12) and at the optimal solution of (13), 8 q 2 Q; we declare point j a candidate point for facet f.

Identifying new vertices and creating new facets
With the set of candidate points C f and the unit-length normal vector n f ; we can easily find the candidate point which is furthest away from facet f by solving the following MILP: where p is a point on facet f, as given by the convex combination (14b) of the vertices of facet f.The binary variable w j is 1 if candidate point a j is chosen.At the optimal solution, a is the candidate point which is furthest away (largest d) from facet f. Figure 13 illustrates the process for one facet.
If d is sufficiently large, i.e. d !d; we add the corresponding point to the set of vertices V. We construct the new facets by connecting the new vertex with the vertices of each ðK À 2Þ-dimensional facet of facet f.In the 2-dimensional case, a facet of the feasible region is a line, and the facets of this facet are always the two end points.Thus, one old facet always leads to two new facets.Note that in higher dimensions, the situation is more complex.The number of new facets created by using one facet and one new vertex is at least K, but could also be significantly larger than K. Finally, with the information on the new facets, we update the sets F, F; and H tþ1 f : Old facets that have been cut off are removed from F while the new facets originated from these old facets are added to F. H tþ1 f ; the set of vertices for each f 2 F in the next iteration t þ 1; is set accordingly.The set of facets to be examined next, F; only consists of the new facets.Figure 14 shows the change in the outer approximation of the feasible region at every iteration.As one can see, it requires three major iterations to obtain the final contour of the feasible region.
Remark 4 At this point, we can write out the CRS model by using the alternative formulation given in Appendix 1, which expresses the feasible region of the CRS model as the difference of the convex hull and the union of infeasible convex regions.The infeasible convex regions correspond to the set of cut-off polytopes Fig. 13 The data points marked by diamonds are the candidate points for the facet at the top.From all candidate points, the one marked by a hollow diamond is the one furthest away from the facet and is therefore declared as a new vertex Q.However, in order to obtain the convex regions, of which the union forms the feasible region of the CRS model, we require the next step of the algorithm.

Constructing convex regions
With the information on the contour of the feasible region, we can now construct the desired convex regions.Figure 15 shows the flowchart for this part of the Phase 2 algorithm, in which we first set an initial number of regions R, typically to 1.With R fixed, we can then solve the convex region assignment problem, which assigns vertices j 2 V; non-vertex points j 0 6 2 V and facets f 2 F to R convex regions r.If the assignment problem is infeasible, we increase R by 1 and try to solve the problem again.
When the assignment problem is feasible and solved, the algorithm checks whether the obtained convex regions overlap in their interior.If there exist overlapping regions, we add cuts to the assignment problem and solve it again.This process is repeated until we reach a feasible solution which yields convex regions that do not overlap.Finally, after removing redundant vertices, we obtain the desired convex regions described by their vertices.
Remark 5 The motivation to forbidding overlapping convex regions is the reduction of redundant feasible space, in our case space belonging to multiple regions.A point belonging to multiple regions corresponds to multiple equivalent solutions and may slow down the optimization algorithm.However, a CRS model with overlapping convex regions is still a valid surrogate model.Therefore, in some cases, eliminating overlapping convex regions may not be necessary.

Convex region assignment formulation
The convex region assignment problem is the centerpiece of the Phase 2 algorithm.The main idea is to partition the approximation of the feasible region defined by the obtained contour into convex regions in the form of polytopes.These polytopes are defined by their vertices, which are chosen from the set of vertices V.The union of the convex regions resulting from this partition has to contain all data points.
In order to find these convex regions, we solve an MILP that assigns vertices j 2 V; non-vertex points j 0 6 2 V and facets f 2 F to R convex regions r.The vertices define the polytopes while the assignment of non-vertex points ensures that all data points are contained in the union of the polytopes.The assignment of facets plays a crucial role in the MILP formulation.Namely, it enables us to formulate constraints through which it can be guaranteed that the regions obtained from the assignment problem are actually polytopes.We elaborate more on this idea later in this section.

Assignment of vertices and facets
The following constraints assign vertices and facets to regions.X where the binary variable x rf is 1 if facet f is assigned to region r, y rj is 1 if vertex j is assigned to region r, and w j is 1 if vertex j is assigned to more than one region.Equation (15a) states that a facet can only be assigned to one region, while constraint (15b) ensures that at least one facet is assigned to a region.Constraint (15c) is a symmetry-breaking constraint that enforces facet-to-region assignment in lexicographic order.Constraint (15d) states that if facet f is assigned to region r, all vertices of facet f, i.e. all j 2 H f ; are also assigned to region r.To encourage the creation of regions that fill K-dimensional volumes, the minimum number of vertices assigned to a region is set to K þ 1 by constraint (15e).Furthermore, constraint (15f) states that a vertex j can only be assigned to more than one region if w j ¼ 1:

Assignment of non-vertex points
The following constraints assign nonvertex points to regions such that the points are inside the regions to which they are assigned. X where the binary variable z rj is 1 if non-vertex point j is assigned to region r.Equation (16a) enforces that a non-vertex point j can only be assigned to one region.Constraints (16b)-(16g) ensure that point j is inside the convex region r if j is assigned to r.The idea is that the corresponding slack variables become zero if z rj is 1 so that a j is constrained to be a convex combination of the vertices of region r.
Constraining regions to be polytopes In the convex region assignment, we have to ensure that the obtained regions are in fact polytopes.In order to formulate such constraints, we make full use of the information from the contour of the feasible region that we obtained in the previous part of the Phase 2 algorithm.The idea is to force all vertices assigned to region r to be on the same side of the hyperplane containing facet f for every facet f that is assigned to region r.To illustrate this idea, we label the vertices in our illustrative example as shown in Fig. 16.
Suppose we assign the facets C-D, D-E, E-F, F-G and the corresponding vertices C, D, E, F, G to the same region.Clearly, they form a convex region as shown in Fig. 17a.One can see that each facet-containing hyperplane is a supporting hyperplane for the formed convex region, i.e. all points in the region lie on the same side of the hyperplane.If we then want to also include facet B-C and vertex B in the region, the region is not convex anymore, as shown in Fig. 17b.Here, both hyperplanes corresponding to facet B-C and C-D cut through the region.Therefore, this solution is infeasible for the convex region assignment problem.
Since each convex region is defined by its vertices, we only have to constrain all vertices assigned to the region to be on the same side of the corresponding facetcontaining hyperplanes.For a point a to be on the side of the hyperplane containing facet f in which the normal vector n f points, the inner product between the difference vector ða À pÞ; where p is a point on the hyperplane, and n f has to be greater than or equal to zero.This results in the following constraint: which states that a j À b 0 f T n f !0 has to hold for all pairs of vertex j and facet f that are assigned to the same region r.
which is the total number of vertex-to-region assignments and the total number of vertices that are assigned to more than one region.This is a heuristic used to discourage the construction of overlapping convex regions.The full convex region assignment problem is then: min s.t.Eqs: ð15Þ; ð16Þ; ð17Þ ð19Þ Fig. 16 Contour of the feasible region with labeled vertices

Identifying and eliminating overlapping convex regions
The optimal solution of the convex region assignment problem may yield convex regions that overlap each other in their interior, as illustrated in the first sketch in Fig. 18.To identify overlapping regions, we solve an LP that is almost identical to (9) used in Phase 1. Except, here we restrict the multipliers to be greater than a small value because we want to detect regions which overlap in their interior.Regions that only overlap at the boundary are not considered overlapping regions.
If overlapping regions exist, we can use the solution of the LP to generate cuts equivalent to the ones in (10) described in Sect.5.3 and add them to the convex region assignment problem.By repeatedly resolving ( 19) with added cuts if overlapping regions are detected, we converge to a solution with no overlap, such as the one illustrated in the second sketch of Fig. 18.

Removing redundant vertices
The algorithm may declare data points vertices although they may later be assigned to convex regions in which they are not vertices of the formed polytopes.In the example shown in Fig. 19, point B has been declared a vertex in the first part of Phase 2. However, by solving the convex region assignment problem, we find the Fig. 17 A region is convex if and only if the hyperplanes containing the assigned facets are supporting hyperplanes for the region.a Formed region is convex.b Formed region is nonconvex Fig. 18 The convex region assignment problem may yield overlapping convex regions.By adding the proposed cuts, these solutions can be avoided two polytopes shown in Fig. 19b which provide an accurate approximation of the feasible region.Point B is on a facet of one of the two polytopes but is not a vertex.Hence, point B is redundant because it is not needed for describing the polytope.
We can eliminate these redundant vertices by finding the vertices of each polytope and removing the ones that are not true vertices.The algorithm terminates after this step since now we have obtained all convex regions with the corresponding vertices required to build the CRS model.

Summary of algorithm
To summarize, we show the complete CRS algorithm at once.Note that for the description of the Phase 2 algorithm, we reintroduce the subset index i, which we omitted for the sake of readability in Sect.6.

Phase 1 (subset assignment with cost correlation constraints)
Step 1.1 (Initialization) Set iteration counter t ¼ 1; initial number of subsets m ¼ 1; and specify error tolerance : Step 1.2 (Subset assignment) Construct set of subsets I t ¼ 1; . ..; m f g ; and solve (6) or (7) to assign data points to subsets.If feasible, construct the sets of assigned data points J i ; and go to Step 1.3.Otherwise, set t ¼ t þ 1; m ¼ m þ 1; reset the set of overlap elimination cuts S ¼ ;; and repeat Step 1.2.
Step 1.3 (Constructing convex hulls) Apply the Quickhull algorithm to each subset i 2 I t to obtain the sets of vertices V i for the convex hulls around the data points assigned to each subset.Step 1.4 (Identifying overlapping convex hulls and generating cuts) Check for overlapping convex hulls by solving (9).If no overlap, construct sets of facets F i and sets of facet vertices H if ; and go to Step 2.1.Otherwise, generate cuts as given by ( 10), add the cuts to (6) or ( 7), and return to Step 1.2.

Phase 2 (convex region assignment)
Step 2.1 (Starting loop over set of subsets) Set subset index i ¼ 1; specify minimum distances d: Step 2.2 (Initializing contour construction) Set iteration counter t ¼ 1; the set of facets to be examined Step 2.3.1 (Starting loop over set of facets) Set f to be the first element in F: Step 2.3.2 (Obtaining normal vector) Obtain a vector n f normal to facet f as described in Sect.6.1.1.Set n f ¼ n f if n f points towards the interior of the feasible region, set n f ¼ Àn f if otherwise.
Step 2.3.3 (Finding candidate points for new vertex) Step 2.3.4aSet j to be the first element of P ¼ j : j 2 J i ^j 6 2 V i f g : Step 2.3.4bSolve (12).If feasible and no point detected in the interior of polytope PT fj ; go to Step 2.3.4c.Otherwise, go to Step 2.3.4d.
Step 2.3.4cSolve (13).If no overlap between polytope PT fj and any cut-off polytopes q 2 Q; add j to C.
Step 2.3.4dIf j is the last element in P, go to Step 2.3.5.Otherwise, set j to the next element in P and return to Step 2.3.4b.
Step 2.6 (Identifying overlapping regions and generating cuts) Check for overlapping convex regions.If no overlap, go to Step 2.7.Otherwise, generate cuts, add the cuts to (19), and return to Step 2.5.
Step 2.7 (Removing redundant vertices) Remove redundant vertices as described in Sect.6.2.3.If i ¼ m; stop and report the solution in form of the parameter values VT irjk corresponding to subset i, convex region r, vertex j, and dimension k.
The algorithm is implemented in MATLAB1 and GAMS2 in order to make use of the Quickhull algorithm in MATLAB and the optimization solvers in GAMS such as CPLEX and BARON.

Computational study
We perform a computational study on the two parts of the algorithm, Phase 1 and Phase 2. To assess the algorithm's computational performance, we apply it to various instances and demonstrate the impact of the number of data points, the dimensionality, and the level of nonlinearities and nonconvexities on the performance of the algorithm.In all instances, the algorithm was performed on an Intel i7 machine at 3.4 GHz with 8 GB RAM.
It should be emphasized that like most surrogate models, the CRS model is constructed ''offline''.Once it is created, it can be used ''online'' (e.g. in a scheduling model) and only has to be updated if the process characteristics have significantly changed.Hence, it is typically acceptable for the computational time required for the construction of a surrogate model to be relatively large since it usually only has to be done once.

Phase 1
Phase 1 strongly depends on the level of nonlinearity in the cost function that we want to approximate.In practice, it is only suited for cases in which only a small number of linear functions are required to approximate the true cost function with the specified maximum fitting error.Obviously, the more nonlinear the true cost function is, the more subsets the algorithm will create.In the worst case, if the true cost function is extremely nonlinear, we may obtain one subset for each individual data point, which makes the construction of a CRS model meaningless.
In the following, we consider constructed instances in 2D and 3D, in which we can clearly see the structure of the cost function in relation to the feasible region, and observe how the algorithm performs in these different instances.Each case is denoted by four numerals, e.g.''2-221-3-A'', where the first number states the dimensionality, the second number is the number of data points, the third number indicates the number of different linear cost functions in the given data set, and the last part of the name distinguishes different variants.
For each dimensionality, we generate four pairs of instances.Each pair consists of two instances that have the same geometric structure but differ in the number of data points.For 2D, we have four instances with 121 data points in each case and the corresponding four instances with 221 data points each.Figure 20 shows the final subset assignment solutions for the four 2D cases with 221 data points.The different colors of the regions indicate different cost correlations.
The computational results for the eight 2D cases are shown in Table 1, which lists the number of subset assignment problems solved, the number of subsets, and the wall-clock time in seconds.The results clearly show that the computational time strongly depends on the number of subset assignment problems that have to be solved to obtain the final result.Naturally, larger data sets increase the size of the subset assignment problem and therefore the required computing time.However, the relationship between the cost correlation and the location of the data points seems to have an even greater impact on the algorithm, more specifically on the number of assignment problems that need to be solved.Essentially, the more feasible solutions we have that lead to overlapping convex hulls, the more iterations are required, which significantly increases the computational time.A good example is Case 2-221-3-B, which only involves three different linear cost correlations but results in five subsets due to the locations of the data points that do not allow three subsets with non-overlapping convex hulls.
Figure 21 shows the 3D cases with 214 data points, and the computational results for all eight 3D cases are presented in Table 2. Here, we make similar observations as in the 2D cases.Among the considered instances, Cases 3-120-3-B and 3-214-3-B stand out.The data in these two instances represent the same geometric structure; however, although we have more data points in Case 3-214-3-B, the computational time is shorter due to the significantly smaller number of subset assignment problems that have to be solved.In this particular case, we observe the phenomenon that the larger data set ''guides'' the algorithm more quickly towards a solution with no overlapping convex hulls.Independent from Phase 1, we analyze the performance of the Phase 2 algorithm by applying it to various data sets in 2D, 3D, and 4D.Since we want to investigate the impact of different factors, such as the number of data points, the prespecified tolerance, and the geometry of the feasible space, some common structure in the cases has to be maintained in order to make proper comparisons.Therefore, the data  are partially randomized but not chosen completely randomly.Each case is named by using two or three numerals, e.g.''2-167-A1'', where the first number indicates the dimensionality of the given data, the second number is the number of data points, and the third part of the name further distinguishes different variants.
Figures 22 and 23 show the cases in 2D.We consider the two base cases shown in Figs.22a and 23a, in which the data points are aligned on a regular grid.As one can see, these two cases differ in the number of data points but have the same geometric structure.All the other cases shown in Figs.22 and 23 are created by randomly perturbing the data points in the corresponding base cases.The random cases shown in the first rows, and the ones in the second rows use the same data but the algorithm has been applied to the latter with a larger tolerance d: It is clear from the shown results that the convex regions obtained from the algorithm can vary significantly when different d values are chosen (e.g.compare Cases 2-91-C1 and 2-91-C2).In general, the number of regions decreases with a greater d: Table 3 summarizes the computational results for the 2D cases.For each case, we report the following indicators:  From the results, we can see that larger d values lead to smaller numbers of facets created by the algorithm, which typically result in smaller numbers of convex regions.The computational time significantly increases with the number of data points since it increases the time required for each t-iteration and makes the CRA problem considerably more difficult to solve.Equally important is the impact of the level of nonconvexity of the feasible region, which is reflected in the number of new facets that have to be created.A larger number of facets increases the complexity of the CRA problem and the likelihood that the CRA problem has to be resolved several times because of overlapping convex regions.
Figures 24 and 25 show the 3D cases considered in this computational study.Similar to the 2D instances, we have two base cases shown in Figs.24a and 25a that differ in the number of data points but not the geometric structure.The remaining cases are generated by randomly perturbing subsets of the data points from the base cases.In order to avoid creating excessively large numbers of facets, the number of t-iterations has been limited to 3. Also, we allow overlapping convex regions, i.e. the CRA problem does not have to be resolved when convex regions overlap each other.Hence, the number of CRA problems solved is equal to the number of convex regions obtained.
Table 4 summarizes the computational results for the 3D cases.In general, the same observations can be made as in the 2D cases.However, when comparing the 2D with the 3D instances, we can see a stark increase in computational complexity resulting from the increase in dimensionality.The computational time has grown dramatically despite limiting the number of t-iterations and allowing overlapping convex regions.This is not only a result of the additional dimension that increases the size of the various optimization problems, but it also stems from the fact that the approximation of nonconvex higher-dimensional spaces generally requires more convex regions.
Unlike Phase 1, which clearly also applies to dimensions higher than three, Phase 2 is more involved and its applicability to higher-dimensional data may not be obvious.Therefore, although the results cannot be easily visualized, we apply the algorithm to some pseudo-randomly generated instances with data in 4D.Table 5 shows the results for the six instances that have been considered.Again, we allow  overlapping convex regions, and the number of t-iterations has been limited to 2. Among the chosen instances, Case 4-81-A stands out because of the exceptionally low computational time.This is obviously due to the fact that the space represented by the data is already fairly convex, such that only four new facets need to be created resulting in 8 convex regions.
The main insight drawn from this computational study is that besides the more obvious factors such as dimensionality and number of data points, the performance of the algorithm (Phase 1 as well as Phase 2) strongly depends on the level of nonlinearity and nonconvexity implied by the data set.The computational effort significantly increases with the geometric complexity since the approximation of a more complex geometric structure generally requires a larger number of convex regions.

Industrial case study
We apply the proposed algorithm to a real-world case study provided by Praxair.Note that this set of data and the resulting CRS model are not the ones used in the actual industrial application, which cannot be shown here due to confidentiality reasons.With this rather unfavorable set of data, however, we mainly want to demonstrate the effect of overfitting and show the importance of effective data collection.
The objective is to construct a CRS model of a production process that can be integrated into a scheduling optimization problem, which is formulated as an MILP.The variable space of the CRS model has two dimensions since we are only interested in the production rates of Products A and B. The cost function is the power consumption.The data were drawn from the actual process for the purpose of analyzing the extent of the feasible operational region (not with the construction of a CRS model in mind).Since sampling from such a complex production process is expensive, only 55 data points were taken.The attempt was made to sample at the boundaries of the feasible region.Each data point carries production rates for Products A and B as well as the corresponding power consumption value.The data are plotted in Fig. 26a.Note that for this case study, we use normalized values.
For the subset assignment in Phase 1, we set the maximum error tolerance to 3 %.This results in the partitioning of the given data points into two disjoint convex hulls shown in Fig. 26b.The cost constants and coefficients for the linear power consumption correlations are shown in Table 6.
As observed in the computational study, the result from Phase 2 strongly depends on the specified d; the minimum distance between a new vertex and the facet that it originates from.For comparison, we apply the Phase 2 algorithm for four different ds, for which the resulting convex regions are shown in Fig. 27.One can see that the smaller d is, the more points tend to be declared vertices, which leads to the construction of more convex regions.While for d ¼ 0:10; four convex regions are constructed, we obtain 11 convex regions for d ¼ 0:04: For d ¼ 0:08 and d ¼ 0:06; we have the same number of convex regions.However, the convex regions for d ¼ 0:06 describe a tighter envelope around the data points.This example illustrates a key principle in data-driven approaches: A good method does not necessarily lead to good results if it is not given a sufficiently large set of useful data.Clearly, among the cases shown in Fig. 27, d ¼ 0:04 results in the tightest envelope around all data points.However, this may not provide the most accurate approximation of the feasible region.Especially when observing the shape of the union of the convex regions for Subset 2, there is reason to suspect that this does not resemble the actual feasible region of the process.However, if we had more data points filling up a larger portion of the feasible space, we might obtain a different result or be more confident about the obtained result.Due to the limited number of data points, one would probably rather apply the CRS model resulting from d [ 0:04; e.g.d ¼ 0:06: In practice, one should try to validate the surrogate model, e.g. by drawing a set of additional data points from the process to test them for feasibility in the surrogate model, and vice versa.10 Remarks In the following, we provide some remarks on the proposed methodology and limiting features of the algorithm, which will motivate future work: • The curse of dimensionality is a major concern when applying the proposed approach.The higher the dimensionality of the data is, the more data points are required to accurately represent the feasible space.Of course, the more data points are involved, the higher is also the computational complexity, which increases up to a point when it becomes intractable.Also, approximating nonconvexities in higher dimensions generally requires more convex regions.At some point, so many convex regions will be required such that the advantage of using a CRS model compared to a nonlinear model diminishes.For practical purposes, this approach may be applicable to data with up to 5 or 6 dimensions.We should note, however, that in practice the computational expense for constructing the surrogate model is not so relevant as this computation is performed only once.What is more relevant is having the surrogate model itself as it can then be applied extensively at much lower cost compared, for instance, to a complex nonlinear model.• In Phase 1, we obtain subsets of data points that are disjoint, which leads to regions that are disconnected.The algorithm is designed this way because continuity of the piecewise linear approximation of the cost function at the boundaries of the regions cannot be guaranteed.However, especially when the number of data points is small, this may lead to overly large gaps between the regions.• In Phase 2, we assume that the feasible region to be approximated is simply connected.However, it is possible that the union of the convex regions resulting from the algorithm is not simply connected, i.e. it contains empty spaces.In that case, we could extend the algorithm and add a mechanism to detect and fill such empty spaces.Nonetheless, the simple connectedness assumption may not always be valid, which is not taken into account in the proposed algorithm.• In Phase 2, a facet will not be further examined if it contains a point in its relative interior since this point would be cut off if the vertices of the facet were used to create new facets.However, this might terminate the facet generation procedure prematurely.The point in the facet's interior could actually be a vertex of the contour of the feasible region, which could be used to further refine the approximation.Detecting such points and integrating them in the procedure, however, is a challenge, especially in high-dimensional cases.• In this work, we focus on the general problem stated in Sect.3, i.e. the generation of CRS models with given fixed sets of data points.We have not considered the sampling of new data to update the model.Also, we do not apply model validation procedures, which would also involve the sampling of new data.

Conclusions
This work has addressed the challenge of generating accurate and computationally efficient surrogate models for mixed-integer programming frameworks.In this paper, we have introduced the idea of CRS models, which can be formulated as sets of mixed-integer linear constraints that provide good approximations of nonlinearities and nonconvexities.In a CRS model, the feasible region is approximated by the union of convex regions in the form of polytopes.Furthermore, for each region, the cost function is approximated by a linear function.
We have presented a two-phase algorithm for the construction of CRS models based on given data.In Phase 1, the algorithm partitions the data points into disjoint subsets such that a linear cost correlation can be obtained within an error tolerance for all points in each subset, and that the convex hulls around the points of each subset do not overlap.In Phase 2, convex regions are constructed for each subset such that the union of the convex regions describes an accurate approximation of the feasible region.The algorithm is presented in detail in the main part of this paper.
An extensive computational study has been conducted in order to assess the performance of the proposed algorithm on different data sets.We have further applied the methodology to an industrial test case drawn from a Praxair plant.Specifically, data from a real production process have been used to generate a CRS model with a feasible region in the product space and power consumption as the cost function.The case studies have demonstrated the applicability of the algorithm as well as provided insights into the impact of given data on the performance of the algorithm.Finally, it is clear that the computational expense may be high.Therefore, a future challenge might be to improve the computational efficiency for solving this problem.We have provided in the supplementary material data of all our test problems so that other researchers may try their methods for solving this problem.

Fig. 1
Fig. 1 Data points are sampled from the feasible region.The nonconvex feasible region can be approximated more accurately by the union of multiple convex regions.a Feasible data points, b convex hull around all data points, c union of four convex regions

Fig. 2 a
Fig. 2 a Points marked by circles (center area) have the same linear cost correlation.The remaining points, marked by diamonds, have a different linear cost correlation.b, c Assignment of data points and resulting polytopes from Phase 1 and 2. a Given data points, b at the end of Phase 1, c at the end of Phase 2

Fig. 3
Fig. 3 Flowchart for the Phase 1 algorithm

Fig. 6
Fig. 6 Starting from the convex hull of the given data points, the Phase 2 algorithm first finds the contour of the tight envelope around the feasible region and subsequently constructs the polytopes representing the envelope

Fig. 9
Fig. 9 All data points lie on one side of a supporting hyperplane

Fig. 11 A
Fig. 11 A candidate point is found if the polytope formed with the facet does not contain any data points in its interior.a Candidate point, b candidate point, c not a candidate point

Fig. 12 F
Fig. 12 F is the new vertex originating from facet A-B.E is not a candidate point for facet B-C because the resulting polytope B-C-E and the already cut-off polytope A-B-F overlap.a Initial convex hull, b new vertex F from A-B, c E not candidate point for B-C

Fig. 14
Fig. 14 New facets created at each iteration.The thick red lines show the newly added facets whereas the dashed red lines indicate the facets in the previous iteration from which the new ones originate.(Color figure online)

Fig. 15
Fig. 15 Flowchart of the algorithm applied to construct the desired convex regions

Fig. 19
Fig.19In this example, point B is a redundant vertex because it has been declared a vertex, but is not a vertex of any of the constructed convex regions.a Vertices found while determining the contour of the feasible region.b Convex regions obtained after the convex region assignment

•
prespecified tolerance d • initial number of facets, which are the facets of the initial convex hull around all data points • final number of facets, which are the facets created to describe the contour • number of t-iterations, which are the iterations in which new facets are created • number of convex region assignment (CRA) problems solved • wall-clock time in seconds

Fig. 26
Fig. 26 In Phase 1, the given data points are partitioned into two disjoint subsets, denoted subsets 1 and 2. a Given data points, b subset assignment result

Fig. 27
Fig. 27 Depending on d, the Phase 2 algorithm constructs different convex regions.a d = 0.10, b d = 0.08, c d = 0.06, d d = 0.04 ), obtain distance d and the corresponding point j.If d !d; add point j to the set of vertices V i ; create new facets, remove facet f from and add new facets to F i ; add cut-off polytope to Q and update set of cut-off polytope vertices H iq : Step 2.3.6 (Closing loop over set of facets) If f is the last element in F; go to Step 2.3.7.Otherwise, set f to the next element in F and return to Step 2.3.2.Step 2.3.7 (Updating contour information) Construct H tþ1 if according to F i : If new facets have been created, empty F and add the new facets to it, set t ¼ t þ 1; and return to Step 2.3.1.Otherwise, go to Step 2.4.

Table 3
Computational results from Phase 2 for the 2D cases

Table 4
Computational results from Phase 2 for the 3D cases

Table 5
Computational results from Phase 2 for the 4D cases

Table 6
Constants and coefficients for the power consumption correlations

Table 8
Cost constants and coefficients used in the illustrative example