A tractable approximation of non-convex chance constrained optimization with non-Gaussian uncertainties

Chance constrained optimization problems in engineering applications possess highly nonlinear process models and non-convex structures. As a result, solving a nonlinear non-convex chance constrained optimization (CCOPT) problem remains as a challenging task. The major difficulty lies in the evaluation of probability values and gradients of inequality constraints which are nonlinear functions of stochastic variables. This article proposes a novel analytic approximation to improve the tractability of smooth non-convex chance constraints. The approximation uses a smooth parametric function to define a sequence of smooth nonlinear programs (NLPs). The sequence of optimal solutions of these NLPs remains always feasible and converges to the solution set of the CCOPT problem. Furthermore, Karush–Kuhn–Tucker (KKT) points of the approximating problems converge to a subset of KKT points of the CCOPT problem. Another feature of this approach is that it can handle uncertainties with both Gaussian and/or non-Gaussian distributions.


Introduction
In recent years, there has been growing interest in chance constrained optimization (CCOPT) methods in engineering and financial applications. Applied optimization problems subject to constraints involving uncertain variables may become infeasible if they are required to be satisfied deterministically. Therefore, chance constrained optimization provides decision and planning strategies that guarantee optimal system performance as well as reliability in product (output) quality. In addition, CCOPT methods have become indispensable tools of risk management and risk-metrics in financial applications in the context of value-at-risk (VaR) and conditional valueat-risk (cVaR).
This study on CCOPT is motivated by engineering applications -where model equations are large scale and highly nonlinear. As a result, the study here considers a more general nonlinear non-convex chance constrained optimization problem of the form subject to Pr g j (x, u, ξ) where x ∈ R n is a vector of state variables, u ∈ U ⊂ R m is a vector of inputs, U is a compact set and ξ is a vector of Borel-measurable random (uncertain) input variables from a probability space into R p . The random vector ξ is assumed to have a known joint probability density function φ(ξ ) with support on the Borel-measurable set ⊂ R p . The density φ(ξ ) can be either a Gaussian or non-Gaussian probability density function. The functions f , g j , G : R n × R m × → R are at least one-time continuously differentiable with respect to x, u and ξ . The operators E(·) and Pr(·) represent expected value and probability, respectively. The expression Pr{g j (x, u, ξ) ≤ 0} ≥ α j is a chance constraint. It specifies the probability of holding the constraint g j (x, u, ξ) ≤ 0 with a reliability level α j , where 1/2 ≤ α j ≤ 1. 1 The nonlinear equality constraints G(x, u, ξ) = 0 are required to be satisfied for all possible realizations of the uncertain variables ξ . Nonlinear equality constraints of the type G(x, u, ξ) = 0 usually arise in practical engineering applications. In general, the decision (control) variables u are deterministic, while the state variables x of the processes are influenced by both controls u and uncertainties ξ as indicated by x (u, ξ). Hence, as a result of the nonlinearity of G (x, u, ξ) and the nature of the distribution of the uncertainties ξ , the functions p j (u) := Pr{g j (x, u, ξ) ≤ 0}, j = 1, . . . , q, may not be convex. Furthermore, random input variables ξ are traditionally assumed to have Gaussian distributions. Nevertheless, this is not always the case in many practical applications. For instance, the wind speed in wind power generation (Celik, Makkawi, and Muneer 2010) and the clearness index in solar energy (Ibanez, Rosell, and Beckman 2003) are found to be Beta and Gamma distributed, respectively. Therefore, there is a strong practical need for the consideration of nonlinear non-convex chance constrained optimization problems in the presence of non-Gaussian distributed random variables. Despite their prevalence in engineering applications, there is no work known to the best knowledge of the authors that studies CCOPT with non-convex g (u, x, ξ) and non-Gaussian uncertainties. This work proposes a smooth analytic approximation strategy in order to improve the tractability of nonlinear non-convex CCOPT models with non-Gaussian uncertainties. Nevertheless, if the uncertain input variables cannot be modelled as random variables with known probability distributions, it is appropriate to use the robust optimization approach instead of CCOPT, in which the input uncertainties are commonly assumed to belong to a bounded set. Since the robust optimization approach (Ben-Tal and Nemirovski 1998Nemirovski , 1999 attempts to satisfy constraints under almost all possible realizations of the uncertainties, generally it may lead to a semi-infinite optimization problem (López and Still 2007). Some classes of chance constrained or robust optimization problems can also be modelled as semi-definite optimization problems, e.g. Bertsimas, David, and Caramanis (2011), Feng, Dabbene, and Lagoa (2011), Jasour and Lagoa (2012, Zymler, Kuhn, and Rustem (2013), etc. In particular, Özmen et al. (2011) Özmen, Weber, and Karimov (2013, Weber, Çavuşoglu, and Özmen (2012), Werner (2008) and Schöttle and Werner (2013) consider robust optimization approaches in financial risk analysis and portfolio optimization problems that lead to conic or semi-definite optimization problems.
The remainder of the article is organized as follows. Section 2 presents a brief review on latest developments in methods of chance constrained optimization. Section 3 puts to-date-available analytic approximation schemes into a general context, suggests a novel smooth and monotonic function for analytic approximation and studies its properties. Section 4 presents a sequence of nonlinear programs (NLPs) as an approximation to CCOPT. This section investigates Hausdorff convergence of the sequence of feasible sets of these NLPs to the feasible set of CCOPT problems, approximability of optimal solutions of CCOPT problems by optimal solutions of the NLPs and convergence of the set of Karush-Kuhn-Tucker (KKT) points of the NLPs to a subset of KKT-points of CCOPT problems. Numerical computation results and a case study in Section 5 demonstrate the viability and effectiveness of the proposed approximation scheme. The article concludes in Section 6 with a summary and some future research directions. Furthermore, details of mathematical proofs and additional references are given in an online supplement to this article (see the section 'Supplemental data' before the references list).

State-of-the-art solution of nonlinear chance constrained optimization problems
For the sake of brevity and clarity of presentation, the discussion here considers CCOPT with one chance constraint Pr{g(x, u, ξ) ≤ 0} ≥ α. However, the results presented easily carry over to the case of multiple chance constraints as in inequality (3) (see, for instance, Section 4.3). Since the state (output) variables x depend on the input control variables u and the random variables ξ , the CCOPT problem can be projected onto the solution set of the equation G(x, u, ξ) = 0 for given u and a realization of ξ . 2 Subsequently, CCOPT can be written equivalently as subject to where To make the ensuing presentation more transparent, in the following the exclusive dependence on x can be dropped so that Remark 1 Assuming the applicability of the implicit function theorem for Equation (2), the smoothness properties of G can be translated to corresponding smoothness properties of the solution function x (u, ξ). Hence, f (u, ξ) = f (x(u, ξ), u, ξ) and g(u, ξ) = g(x(u, ξ), u, ξ) are smooth, given the smoothness of G (x, u, ξ), f (x, u, ξ) and g(x, u, ξ).
In CCOPT problems, the main object of investigation is the probability function p(u) := Pr{g(u, ξ) ≤ 0}. Currently, the structural properties of p(u) such as continuity (Theorem 10.1.1., p. 302 in Prékopa 1995), convexity (Prékopa, Yoda, and Subasi 2011;Henrion and Strugarek 2008;Lagoa, Li, and Sznaier 2005, and references therein), and differentiability (Marti 1996;Uryasev 1995;van Ackooij et al. 2010;Henrion and Möller 2012) are well studied. The major challenge in solving nonlinear CCOPT problems stems from the difficulty of computing the probability value p(u) for a given u. Specifically, for a general nonlinear function g (u, ξ), it is difficult to determine a priori the probability distribution of Z = g (u, ξ). This challenge becomes more formidable if derivatives of the chance constraint need to be numerically computed. The derivative formulas given by Marti (1996) and Uryasev (1995) are more of a theoretical nature. Recent derivative formulas given by Henrion (2012) and Henrion and Möller (2012) are limited to linear chance constraints with Gaussian uncertainties.
Some special instances of chance constraints can have exact analytic (deterministic) representations explicitly as functions of the variable u (e.g. see Birge and Louveaux 1997). In general, such representations are guaranteed only for linear chance constraints with Gaussian distributed uncertainties. Simplifications of these types are frequently used in several applications (see Schwarm and Nikolaou 1999 for instance). Except for such special cases, analytic representations are difficult to obtain even for linear forms such as • chance constraints with additive uncertainties: p(u) = Pr{Au + Bξ + d ≤ 0} ≥ α (such constraints are prevalent in chance constrained linear model predictive control problems, e.g. Blackmore and Ono 2009, Cannon, Kouvaritakis, and Wu 2009, Oldewurtel, Jones, and Morari 2008; • chance constraints with linear matrix inequalities (LMIs): Nemirovski (1998), Nemirovski (2012), Nemirovski and Shapiro (2006b); • chance constraints with multiplicative relations: Feng, Dabbene, andLagoa 2011 andPrékopa, Yoda, andSubasi 2011).
In fact, exact (deterministic) analytic representation for nonlinear chance constraints is generally elusive. As a result, recently there has been a concentrated research effort to improve the tractability of nonlinear CCOPT problems through approximation methods. Among such methods are scenario generation, sample average approximation (SAA), linearization and analytic approximation. Scenario generation methods Campi 2005, 2010) are applicable irrespective of the type of distribution function of the uncertain variables (Gaussian or non-Gaussian). However, they require a very large number of scenarios (see Nemirovski andShapiro 2006a, 2006b) in order to satisfy chance constraints with given reliability levels, requiring the solution of very large deterministic optimization problems that are computationally expensive. Recent scenario reduction techniques Garetti 2011, Henrion, Küchler, andRömisch 2009) could provide some improvements of computational burdens.
The potential difficulty associated with SAA is that relative-frequency approximation of a probability requires a very large sample size. Thus, a solution obtained through the SAA approach may fail to be feasible to CCOPT. The work by Garnier, Omrane, and Rouchdy (2009) considers a linearization of the form This provides a decoupling of the random variables ξ from the (deterministic) problem variables u. Nevertheless, such a linearization scheme is valid only when the random variables ξ have small variance. In fact, recently, Nemirovski (2012) considered tractability issues of chance constraints of the form Pr{w 0 (u) + p k=1 ξ k w k (u)} ≥ α. Possible infeasibility of solutions obtained from scenario generation and SAA methods are major drawbacks. In fact, these methods require enormous computational effort to achieve only approximate feasibility of solutions. Therefore, analytic methods replace chance constraints with some approximations and attempt to provide an a priori guarantee of feasibility along with tractability. Pinter (1989) was the first to propose such analytic approximation schemes. Nemirovski and Shapiro (2006a) and Nemirovski (2012) have suggested analytic approximations for convex chance constraints that are affine with respect to the uncertain variables. Recently, Feng, Dabbene, and Lagoa (2011) proposed an analytic approximation approach through the concept of a kinship function that is applicable when the chance constrained function is convex w.r.t. the decision variable u and polynomial w.r.t. to the uncertain variables ξ . This approach benefits from enormous computational advantages by providing efficient evaluation of chance constraints. Furthermore, based on the strategy suggested by Rockafellar and Uryasev (2000), Hong,Yang, and Zhang (2011) (see also Xiao et al. 2012, Wozabal 2012and Wozabal, Hochreiter, and Pflug 2010 have successfully shown that approximate solutions of convex chance constrained optimization problems can be obtained by solving a sequence of difference-convex (DC) deterministic optimization problems. This is a breakthrough in the numerical solution of nonlinear convex chance constrained optimization problems. Nevertheless, difference-convex algorithms are inherently limited to problems of smaller dimensions. At the same time, the DC approach benefits from little computational advantage for general non-convex CCOPT. Furthermore, the work by Norkin (1993) suggests a method similar to non-parametric density estimation (for Z = g(u, ξ)) to setup an analytic approximation, but it needs a modification (as indicated in Section 3.3) in order to make it applicable for problems with non-symmetric and non-Gaussian distributions. The works of Feng, Dabbene, and Lagoa (2011) and Jasour and Lagoa (2012) also suggest relaxation strategies.
This work proposes an analytic approximation strategy for non-convex chance constraints that guarantees a priori feasibility of solutions to the chance constraints, a smooth approximation, simplicity of implementation and applicability to large-scale non-convex engineering optimization problems.

Further properties of CCOPT
The probability function p(u) = Pr {g(u, ξ) ≤ 0} can alternatively be expressed as This immediately yields the equivalence It is assumed that p(u) is a continuous function of u (see Prékopa 1995, p. 302). By Remark 1,g(u,ξ) is differentiable with respect to u and E[h(·, ξ)] is a lower semi-continuous function of u (according to Lebesgue's Integration Theorem). Similarly, by Remark 1,u The feasible set of CCOPT K := {u ∈ U | p(u) ≥ α} and the set M := {u ∈ U | E [h(u, ξ)] ≤ 1 − α} are compact, by the compactness of U. Furthermore, from Equation (8), it follows that Therefore, both problems min u∈K F(u) and subject to possess (the same) optimal solution set. Despite the fact that the constraint E[h(u, ξ)] ≤ 1 − α provides an exact representation of the chance constraint, the function h causes difficulties in numerical computations. Even so, the function E[h(u, ξ)] provides some insight for the construction of a smooth, tight and tractable approximation of CCOPT.

A generalized analytic approximation of chance constraints
A general setup for the analytic approximation of chance constraints with an a priori feasibility guarantee is to determine a continuous function ψ : R ++ × R m → R + with the following properties: where R + := [0, +∞) and τ ∈ R ++ := (0, +∞) is a parameter. From properties P2, P3 and the continuity of Once such a function is available, the constraint E[h(u, ξ)] ≤ 1 − α in (NLP2) can be replaced by the constraint inf τ >0 ψ(τ , u) ≤ 1 − α. As a result of property P1, for each fixed τ > 0, the optimal solution of the problem is feasible to (NLP2); consequently, feasible to CCOPT. Moreover, the feasible set M(τ ) = {u ∈ U|ψ(τ , u) ≤ 1 − α} of NLP τ is always a subset of the feasible set of NLP2, so of CCOPT. Most importantly, properties P2 and P3 imply that in the sense of convergence of sets (see the online supplement 'Supplemental data' before the references list). In fact, in Equation (5) K holds true, due to the compactness of each M(τ ) and the monotony property Basically, the so-called concentration of measure inequalities from probability theory can be used as a background to construct a function ψ with properties P1-P3 (Pinter 1989;Nemirovski and Shapiro 2006a). For example, the function ψ can be defined as ψ(τ , u) := E [ (τ , g(u, ξ))] for some given real-valued function : R ++ × R → R + . In the literature, there are some suggestions for where the associated function ψ satisfies some of the properties P1-P3: (Pinter 1989;Nemirovski and Shapiro 2006a); (Feng, Dabbene, and Lagoa 2011) The strategy of using the function 1 (τ , s) provides a convex approximation of p(u) when g(u, ξ) is an affine function of ξ . Nevertheless, this approximation scheme is not directly applicable to the computation of general non-convex chance constraints. The approximation based on 2 (τ , s) can be generalized by lifting the symmetric assumption on k(s). While concrete non-symmetric function k(s) is easy to construct, the work of Norkin (1993) attempts to solve CCOPT problems through a sequence of non-smooth optimization problems which limits its wider applicability. The function 4 (τ , s) preserves convexity and polynomial structures in the chance constrained function. However, when problems have complex nonlinearities, these structures can be hard to verify unless they are readily available (see Section 5 for numerical examples). Recent investigations related to value-at-risk (VaR) and conditional VaR (cVaR) make use of the continuous piecewise-differentiable function 3 (τ , s). This function can help approximate CCOPT through DC optimization problems (Hong, Yang, and Zhang 2011;Wozabal 2012;Wozabal, Hochreiter, and Pflug 2010;Xiao et al. 2012). The following section introduces a novel smooth function (τ , s) satisfying the properties P1-P3 which can be used for the approximation of general smooth non-convex chance constraints.

A new analytic approximation and its properties
The method proposed here uses an analytic approximation based on the parametric function where m 1 (u) and m 2 (u) are given positive functions (or constants), m 2 (u) ≤ m 1 (u) and 0 < τ < 1.
The function (τ , u, s) possesses several salient properties that make it amenable for the intended approximation.
Proof For the proof, see the online supplement available via the link given in the section 'Supplemental data' before the references list.  (16) for m 1 = 1 and m 2 = 0.5. It can be seen that the approximation profile of (τ , u, s) becomes much tighter as τ decreases towards 0 + . In addition, instead of designing a tighter approximation for E [h(x, ξ)] by taking infimum with respect to τ (cf. Nemirovski and Shapiro 2006a), a monotonic reduction of (τ , u, s) is computationally much more preferable.

A sequential approximation of CCOPT and convergence properties
For a fixed parameter value τ k ∈ (0, 1), consider the nonlinear optimization problem where the feasible set M(τ k ) = {u ∈ U|ψ(τ k , u) ≤ 1 − α} and the objective function E [f (u, ξ)] are the same as in CCOPT. Let {τ k } +∞ k=1 ⊂ (0, 1) be a decreasing sequence of parameters (i.e. τ k+1 < τ k ) and Then this leads to a sequence of nonlinear optimization problems NLP k , k = 1, 2, . . . Since f (·, ξ) and ψ(τ k , ·) are continuous functions on the compact set u ∈ U, then M(τ k ) is a compact set and NLP k has an optimal solution u k ∈ M(τ k ). Furthermore, assuming that the functions f (·, ξ) and ψ(τ k , ·) are at least once continuously differentiable (see Section 4.2), these optimal solutions of NLP k can be determined using a gradient-based nonlinear optimization algorithm. The main interest now lies in verifying how the sequence of solutions {u k } +∞ k=0 of NLP k are related to the original CCOPT problem.

Convergence of feasible sets
This section makes use of definitions and properties for the convergence of a sequence of sets. Details and relevant references are given in the online supplement. Now, based on property P2 (Section 3), for any τ ∈ (0, 1), the set M(τ ) ⊂ K. Specially, the sequence of sets {M(τ k )} k∈N has the convergence property that lim sup k→∞ M(τ k ) ⊂ K, for any decreasing to zero sequence {τ k } k∈N . Furthermore, the equality lim k→∞ M(τ k ) = K can be guaranteed under the following assumption.
The compactness of U, the monotonicity and continuity of ψ(·, u) with respect to τ as well as properties P1-P3 imply that, for each decreasing to zero sequence {τ k } k∈N and for a regular chance constraint, the relation lim k→∞ M(τ k ) = K holds true.
Proof For the proof, see the online supplement available via the link given in the section 'Supplemental data' before the references list.
The following statement strengthens the result of Theorem 4.2 guaranteeing the uniform convergence of the feasible sets M(τ k ) of NLP k to the feasible set K of CCOPT; i.e. convergence w.r.t. the Hausdorff metric. The Hausdorff distance between two closed and bounded sets A ⊂ R n and B ⊂ R m is given by H(A, B)  Proof For the proof, see the online supplement available via the link given in the section 'Supplemental data' before the references list.

Remark 2
Observe that the results of Theorem 4.2 and Corollary 4.3 hold true for any sequence {τ k } such that τ k+1 < τ k and τ k 0 + . These results can be further generalized as The discussions in this section reveal that M(τ ) can be made as close as possible to K asymptotically, becoming equal to K in the limit. This equality holds true irrespective of how the parameter τ is driven to 0, from above. Consequently, instead of working with the computationally difficult set K for the numerical solution of the CCOPT problem, it is convenient to work with M(τ k ) for an arbitrary monotonically decreasing sequence τ k 0 + .

Convergence of approximate solutions
According to the definitions of NLP k and the properties of ψ(τ k , u), it is obvious that each solution u k of NLP k is a feasible point of NLP2, hence, also of CCOPT. Furthermore, for each τ k , the optimal value of NLP k is an upper bound for the optimal value of CCOPT, i.e. E[f (u k , ξ)] ≥ min u∈K E[f (u, ξ)]. In the following, the compactness of the set U will be used to show that any limit point of a sequence {u k } k∈N of local optimal solutions of NLP k is not only a feasible point, but also a local optimal solution of the original CCOPT problem. (1) Let {u k } k∈N be a sequence such that u k is a local optimal solution of the NLP k , k = 1, 2, . . .
i.e. u * is a local optimal solution of (CCOPT). (2) Conversely, if u 0 is a strict local minimizer of (CCOPT), then there is a sequence of local minimizers u k of NLP k which converges to u 0 .
Proof For the proof, see the online supplement available via the link given in the section 'Supplemental data' before the references list.
The most important result in Theorem 4.4 is that any sequence {u k } k∈N of local (or global) optimal solutions of the problems NLP k has a convergent subsequence whose limit is a local optimal solution of the original CCOPT problem. Therefore, the solution of a sequence of smooth nonlinear (non-convex) optimization problems NLP k , for any zero-sequence {τ k } k∈N , provides a working approximation for the optimal solution of the CCOPT problem where at the same time all the obtained solutions of NLP k remain feasible to CCOPT.
Since the CCOPT problem and NLP τ are non-convex, their respective set of KKT points may not be singletons. Hence, it is necessary to show that the set of KKT points of NLP k converge to a subset of the KKT set of the CCOPT problem. 4 To obtain this result, there should be a relation between the derivatives ∇p(u) and ∇ u ψ(τ , u).

Convergence of derivatives
Given u 0 ∈ U, this section verifies the convergence for any sequence {τ k } k∈N with τ k 0 + uniformly for u in some ball around u 0 .
Proposition 4.6 Suppose Assumption 4.5 be satisfied. If the function g(u, ξ) has continuous partial derivatives w.r.t. u and the density φ is continuous in some small neighbourhood of the union ∪ũ ∈B(u) 0 (ũ) with some small ball B(u) around u, then p(u), respectively 1 − p(u), is continuously differentiable.
Proof The proof of this proposition can be found in the standard literature.
In order to obtain the required convergence in Equation (21), referring to the differentiability of p(u) (Proposition 4.6) and the uniform convergence in Lemma 4.7, it suffices to show that the sequence {∇ψ k (u)} k∈N is uniformly convergent in some bounded neighbourhood W of u 0 , where W ⊂ U . Accordingly, Proposition 4.8 implies that
If m 1 and m 2 are chosen as constants, then the first two terms vanish and the gradient ∇ψ(τ k , u) takes a simpler form.
Proof For a sketch of the proof, see the online supplement available via the link given in the section 'Supplemental data' before the references list.
In summary, for a differentiable probability function p(u), the function ψ(τ , u) not only approximates the function (1 − p(u)), but the gradients ∇ψ(τ k , u) also approximate the gradient −∇p(u) for any sequence of parameters {τ k } with τ k 0 + . This justifies the use of ∇ψ(τ k , u) in optimization algorithms instead of −∇p(u), since the latter can be harder to compute.

Convergence properties of the KKT sets
The results of the previous sections are easy to translate to a CCOPT problem with multiple single chance constraints p j Define the sets η) is a local optimal pair for CCOPT }, η) is a strict local optimal pair for CCOPT }, is a local optimal pair for NLP k } . Since U is a compact set, the sets 0 and 0 k , for each k, are non-empty sets. Proof For the proof, see the online supplement available via the link given in the section 'Supplemental data' before the references list.
In contrast to the result given in Theorem 2.9 of Xiao et al. (2012), the result provided in Theorem 4.13 is much stronger. In any case, the MFCQ Assumption 4.11 is indispensable and it also implies Assumption 4.1 'Regularity'. From a computational point of view, Theorems 4.4 and 4.13 guarantee that CCOPT problems can be solved approximately by using any state-of-theart gradient-based optimization solver.

Computation of probability integrals
The values and gradients of the objective function and constraints are obtained by computing multidimensional probability integrals of the form at each iterate u k of the optimization algorithm. The numerical evaluation of these integrals is computationally expensive. Generally, there are two major approaches for the evaluation of multidimensional integrals: Monte Carlo (MC)/quasi-Monte Carlo(QMC) methods and cubature rules.
Traditionally, multidimensional probability integrals have been computed using MC (see Deák 2011) or QMC methods (Caflisch 1998;Niederreiter 1992;Sloan 1994). The QMC methods rely on the generation of low-discrepancy integration points that are uniformly distributed in a bounded domain of integration. Despite the fact that QMC methods require a very large number of integration nodes, which result in a very large number of function evaluations in higher dimensions (Heiss and Winschel 2006;Gerstner and Griebel 1998), they provide quite accurate approximation, even for non-smooth integrands.
Recently, on the other hand, sparse-grid integration techniques (Smolyak 1963) are being widely used to reduce computational expense in the evaluation of high dimensional integrals. Hence, the work Klöppel et al. (2011) demonstrates the efficiency of sparse grid integration methods for the evaluation of chance constraints (see also Geletu et al. 2011). Integration nodes of the sparse grid integration technique (Bungartz and Griebel 2004;Gerstner and Griebel 1998) are constructed through a skillful combination of one-dimensional quadrature nodes. As a result, sparse grids methods can be constructed so as to compute integrals of polynomial functions exactly and efficiently. However, for general functions, the possible existence of negative weights in the weighted-sum approximation of integrals is a major drawback of sparse grid methods. In fact, negative integration weights are known to cause numerical cancellations and inaccuracies in the approximation. Furthermore, as τ decreases, the smoothness of the integrand reduces. Since sparse grid methods depend on the smoothness of the integrand, the quasi-Monte Carlo integration method is found to provide more stable results than the sparse grid method. Hence, in this work, • integrals are computed through the quasi-Monte Carlo method; • at each step of the optimization algorithm, a set of multidimensional integrals need to be computed corresponding to the values and gradients of the objective function and constraints; since these integrations are highly parallelizable, code parallelization techniques are used in order to ameliorate computation burdens; • the experiments reported here show that overall CPU-time decreases linearly with the number of processors used.
In order to demonstrate the viability and efficiency of the proposed approach, two numerical examples are presented in Section 5.2 and a case-study is discussed in Section 5.3. The optimization problems in Section 5.2 are solved by using MATLAB ™ 's fmincon solver; while, for the case-study in Section 5.3, optimization problems are solved through the interior point optimization solver IpOpt (Wächter and Biegler 2006). IpOpt is capable of working with first-order information and it is highly efficient for the solution of NLP k with smaller values of the parameter τ k . Furthermore, the theory developed above holds true when m 1 and m 2 are chosen as positive constants and m 2 ≤ m 1 . All numerical computations are done on a desktop PC with an Intel 980X hexacore processor with 3.33 GHz per core and 6 GB RAM.

Model predictive control of a batch reactor
This section presents some numerical experiments in order to demonstrate the efficiency of the proposed approximation strategy. For this purpose, the chance constrained model predictive control of a batch reactor is studied here (see also Geletu et al. 2012). The dynamic optimization problem can be formulated as subject to where k 1 (t) = k 10 exp{−E 1 /RT (t − 1)}, k 2 (t) = k 20 exp{−E 2 /RT (t − 1)}, x 1 and x 2 are the concentrations of species A and B, and T , T M and T J are the temperatures of the reaction mass, wall and jacket, respectively. The quantities k 1 , k 2 describe the reaction rates, E 1 , E 2 the activation energy, k 10 , k 20 the pre-exponential Arrhenius constants and F J is the amount of cooling water per hour. The variables E 1 , E 2 , k 10 and k 20 are assumed to be random with a joint normal distribution (see Table 1). Physical parameters of the batch reactor are shown in Table 2. The objective function is given by ) and ω 1 , ω 2 ∈ R are weighting factors. The objective is to maximize the expected concentration of product B (i.e. E[x 2 (t)]) while decreasing its variations (Var[x 2 (t)]) and at the same time to minimize the fluctuation in the cooling water stream. Further details on the solution of the problem can be found in Geletu et al. (2012).
Here, this example is used to study the influence of the number of chance constraints on the computation time. On each prediction horizon, the problem represents a chance constrained nonlinear optimization problem. The number of chance constraints in Equation (27) equals the length of the prediction horizon h. For the sake of comparison, the same problem is solved with the previously proposed back-mapping approach of Wendt, Li, and Wozny (2002) and the analytic approximation strategy, for various lengths of the prediction horizon h. The back-mapping approach uses a sparse grid integration with 441 grid points. The analytical approximation approach uses m 1 = 1.1, m 2 = 1, τ = 0.001 and quasi-Monte Carlo integration with 500 grid points. The computation here considers a one hour operation of the process on a total of 30 prediction horizons. Considering the dynamic nature of the problem, computation time per iteration is examined and the result of the experiment is shown in Figure 2.
It is apparent that the back-mapping approach needs significantly more computation time per iteration. Moreover, the computation time per iteration in the analytical approximation approach appears to depend only linearly on the length of the prediction horizon, whereas with the backmapping approach it appears to increase quadratically with increasing prediction horizon. This can be explained by the fact that the analytical approximation approach requires the solution of the model equations only once at every grid point of the underlying integration routine of the chance constraints. In contrast, the back-mapping approach requires the solution of h different modified versions of the original model equations at every grid point, resulting in a quadratic dependence of O(h 2 ). In general, if N c is the number of chance constraints, T sol is the computation time necessary to solve the model equations, and N i is the number of grid points in the integration routine, then one would expect a computation time of order O(N i T sol ) for the analytical approximation method and O(N c N i T sol ) for the back-mapping approach, respectively. These are rough estimations based on problem data and customized implementations. A theoretical analysis needs to be further studied. In general, the computational effort increases with decreasing value of τ .
It should be remarked that analytic approximation methods based on convex polynomial kinship functions (Feng, Dabbene, and Lagoa 2011) would require comparatively fewer integration nodes N i , since these methods allow the efficient usage of sparse grid integration for the evaluation of chance constraints (see also Klöppel et al. 2011). Hence, computational complexities are of the order of O(N i T sol ). This indicates that a kinship-function approach seems to be the best choice when the chance constrained function g(x, ξ) is convex w.r.t. x and has a polynomial structure w.r.t. ξ . Even then, the next example indicates that for specific problems the analytical approximation approach results in tighter bounds of the chance constraints and, therefore, improved optimal values.

A robust LP
This section studies the robust operation of a plant under uncertainty. The problem formulation as presented in Feng, Dabbene, and Lagoa (2011) is subject to x T x ≤ 1000, where x ∈ R 3 are the control variables, A i and B i are the rows of the matrices A(ξ ) and B(ξ ), respectively; ξ ∈ {ξ ∈ R 4 | ξ 1 ≤ 0.05, ξ 2 ≤ 0.05, ξ 3 ≤ 0.1, ξ 4 ≤ 0.05} are the uncertain disturbances, which are uniformly distributed, and the remaining quantities are defined as follows: The optimization problem (30)-(34) has a joint chance constraint. In order to apply the proposed analytic approximation method, the relation will be helpful. In fact, this follows from the generic inequality that max{x, y} ≤ ln (e x + e y ). Consequently, As a result, the joint chance constraint (33) can be treated through the analytic approximation method. The solution obtained by Feng, Dabbene, and Lagoa (2011), using the kinshipfunction approach, is x = (6.2922 20.4695 9.7037) T . Using the analytical approximation of the constraint Pr{ln( 8 i=1 [exp(A i x + B i )]) ≤ 0} ≥ α with α = 0.99, the solution found is x = (8.4598 18.1449 9.923) T . This indicates a significant decrease in the objective function value in comparison to that given by Feng, Dabbene, and Lagoa (2011). Moreover, a Monte Carlo simulation, using 1,000,000 samples from the uncertainty set, reveals that violations of the constraints A i x + B i ≤ 0, i = 1, . . . , 8, actually occur, but Pr{A i x + B i ≤ 0, i = 1, . . . , 8} is still greater than the desired 0.99. This indicates that, at least for this example, the analytical approximation method provides a better approximation of the probability values and results in a significantly lower objective function value. A detailed study of the tractable analytic approximation of joint chance constraints remains for future research.
Here, optimization problems are solved by using the MATLAB™'s fmincon solver coupled with a MATLAB™ implementation of a QMC integration routine. The overall required computation time is less than 4.2 minutes. In fact, this CPU-time can be reduced 100-to 500-fold if implementations are done based in C/C++ along with parallelization strategies.

A case-study
The applications of CCOPT in the area of power systems has received increased interest in the last years. The first application was by Ozturk, Mazumdar, and Norman (2004) for the solution of the unit commitment problem (i.e. the problem of the optimal scheduling of the operation of generators in an electrical network). Recent applications include transmission network expansion planning (Yu et al. 2009) and optimal power flow under Gaussian distributed uncertainty (Zhang and Li 2011). In this case-study, a problem comparable with that presented in Zhang and Li (2011) is solved. In contrast to the solution presented there, there is no need here for monotonic relations between constrained outputs and uncertain inputs, since such relations may not exist (see Geletu et al. 2011).
In this case-study, an electrical network under uncertain, non-Gaussian distributed, influences from wind generators is optimized. The case study mimics problems which occur in energy networks with a high penetration of wind generators. These generators may have a high total maximum generation capacity and it is quite possible that a high generation of wind power meets low demand, especially during night hours. In order to keep the electrical network stable in such a situation, the injection of power from the wind generators has to be reduced (curtailed), leading to a loss of potentially available energy. Here, the objective function is defined as the minimization of the energy curtailment. As a test case, the 41-bus system studied in Gabash and Li (2012), which is shown in Figure 3, was chosen. This network is a so-called medium voltage distribution network. It is assumed there is a base power of 10 MVA and that a total of 15 wind generators (randomly allocated) are installed in the system, where each generator has a maximum generation capacity of P r,i(max) = 0.15. All units are given in the per unit system, unless otherwise specified. According to Fabbri et al. (2005), the generation of a wind generator can be described by a Beta distribution and the distribution parameters can be calculated from weather forecasts. In the following optimization, results for different weather scenarios are discussed.
The aforementioned operation task can be formulated as a chance constrained optimization problem for one time horizon (e.g. 1 hour).
subject to: where β i , i ∈ I g , is a curtailment factor for the active power generation P r,i of a renewable energy source at bus i, I g = {17, 18, 19, 22, 26, 27, 28, 33, 34, 36, 37, 39, 40, 41} is the set of indices of buses with installed wind generators, V i is the voltage magnitude at bus i, θ ij is the difference of the phase angles between buses i and j, i, j ∈ {1, . . . , 41}, the matrices G and B are the bus conductance and susceptance matrix of the system, P S and Q S are active and reactive injections at the slack bus (in this study the slack bus is bus 1), P d,i and Q d,i are active and reactive loads at bus i and, P i and Q i are the active and reactive power injections into bus i. Equations (38)-(41) are the so-called power flow equations which describe the distribution of power throughout the network. The random input variables P r,i , i ∈ I g , depend on Beta-distributed random variables; i.e. P r,i = P r,i(max) × Beta(a i , b i ), i ∈ I g , where Beta(a, b) describes a Beta-distributed uncertain variable with the parameters a and b. Then the expected values for the generation at bus i can be calculated by The curtailment factors β i , i ∈ I g , are control variables, where a value of one means that no curtailment occurs and a value of zero that no power is injected into the network. The chance constraints (42) and (43) guarantee a reliable operation of the network which connects the network to a higher voltage grid, whereas the chance constraints (44) guarantee a certain voltage magnitude at the PQ-buses (buses 2-41). The optimization task was carried out using IpOpt (Wächter and Biegler 2006) and the chance constraints were evaluated using the method introduced in this article along with quasi-Monte Carlo integration. The two-sided constraints are approximated in the following way: where 0 < τ < 0.0015, m 1 = 1.002 and m 2 = 1. Results for different wind scenarios (i.e. different parameters of the underlying Beta distribution) and for different probability levels α are presented in Table 3. Solutions corresponding to a probability level of α = 0.99 are marked with a subscript H. All other solutions correspond to α = 0.95. The first scenario is a low wind speed scenario; i.e. the expected generation is near the minimum possible value. It can be seen that under such conditions nearly all of the available power can be used. In addition, the solutions for α = 0.95 and α = 0.99 are similar, although, in general, a higher desired reliability leads to increased costs. This can be attributed to the NLP solver which converges toward a local optimum inside the interior of the feasible set. The second scenario is a medium wind speed scenario; i.e. the expected generation is about 50% of the maximum possible generation. In comparison to scenario 1 there is more curtailment and more than 50% of the available wind power is lost. In this case, a higher reliability level leads to increased costs; i.e. more curtailment. The last scenario describes a high wind scenario. Here, only 33% of the available wind energy is utilized for α = 0.95 and only 20% for α = 0.99. In general, the given setting leads to a loss of potentially available energy in  most scenarios due to the constraints on the network. The computation time here is less than 10 minutes.
To give further insight into the proposed method, the problem in scenario 2 was solved for different levels of τ . As can be seen in Figure 4, a decrease of τ leads to a tighter approximation of the chance constraints and thereby a decrease in the objective function value, since the feasible set becomes larger monotonically with decreasing τ . Furthermore, the problem could also be solved for τ < 0.0009, but this requires a more accurate cubature rule, which in return would increase the computational burden due to a higher number of grid points in the computation of integrals.

Conclusions
Optimization under uncertainty becomes a necessity in many engineering applications, such as chemical process design and operation, energy distribution networks, water resources management, and unmanned vehicle control, to name but a few. Chance constrained optimization provides an appropriate way to tackle such stochastic optimization tasks due to the fact that the solution achieves maximum yield while ensuring the desired operational reliability. However, the high complexity of chance constrained optimization problems confronted in engineering applications (nonlinear, non-convex, high dimensional and with uncertain parameters being non-Gaussian distributed) makes them extremely difficult to solve. The approach developed in this article paves a solid step toward the goal of eventually solving complex engineering optimization problems under uncertainty.
Nonlinear non-convex chance constrained optimization problems can be classified as hard problems in computational optimization. In general, the numerical solution of these problems can only be done through approximation strategies. To date, available analytic approximation strategies either fail to provide tight approximations or lead to non-smooth optimization problems. This work presents a general analytic approximation of nonlinear chance constraints and indicates how to set up conservative but tighter approximations with a priori guaranteed feasibility, where solutions of the approximating problem remain always feasible to the chance constraints, converging to a solution of the chance constrained optimization problem in the limit. The proposed approach can be used irrespective of whether the uncertainties are Gaussian or non-Gaussian continuous density functions. In addition, this approach can address CCOPT problems, in which a monotonic relation between the constrained output and the uncertain input does not exist. Furthermore, a concrete function has been presented which satisfies the desired properties and facilitates the solution of general CCOPT problems.
The case study demonstrates the viability of the proposed approach for practical problems that are large scale with complex model equations.
The consideration here is restricted only to single (separate) chance constraints.An investigation with joint chance constraints remains for future research. Furthermore, in the computations, the choice of the parameter τ is a compromise between the accuracy of the approximation scheme and the efficiency of numerical integrations. A smaller τ provides a good approximation of chance constraints, while requiring more integration nodes. On the other hand, a larger τ leads to smoother integrands at the expense of the tightness of the approximation. In the case studies, convenient values of τ are selected based on experimentation. Consequently, the choice of an 'optimal' τ , balancing tighter analytic approximation and the computational accuracy of integrals, is also an important issue for future research.