Robust reformulations of ambiguous chance constraints with discrete probability distributions

ABSTRACT


Introduction
Chance constrained optimization is introduced by [1] (also see, [2][3][4]) and it ensures that the probability of satisfying an uncertain constraint is greater than or equal to a certain threshold while minimizing or maximizing a given objective function.A chance constraint is given by where f (x, ζ) denotes a function of a decision vector x ∈ X ⊆ R n (X is the set of feasible decisions) and a vector of uncertainty parameters ζ ∈ R L , ǫ ∈ [0, 1] is the predetermined probability threshold, and P is the known probability distribution of ζ.It can be shown that the feasible set of (1) is convex when ζ follows a Gaussian distribution and f (x, ζ) is linear in x and ζ; see [5][6][7].Additional convexity results for (1) can be shown when f (x, ζ) is additively separable and ζ follows a log-concave distribution ( [8,9]).Even though the chance constraint is tractable for the above mentioned problem classes, it is generally computationally intractable because the feasible set of the chance constraint is non-convex or it is computationally intractable to compute the left-hand side (LHS) of the constraint even when the feasible set is convex.In the latter case, one may use a Monte Carlo simulation to check the feasibility of the chance constraint, nevertheless, the simulation approach can also be too costly at high accuracies.The chance constraint approach can be extended to multiple constraints, i.e., referred to as the joint chance constraint: Notice that (2) is at least as computationally challenging as (1).Computationally tractable safe approximations of the chance constraint have been proposed to overcome the difficulties that are mentioned above.A safe approximation method replaces the chance constraint with a set of constraints that yields a solution set that is a subset of the feasible set of the chance constraint.Nemirovski and Shapiro [10] propose a computationally tractable approximation of a chance constrained problem where constraints are affine in the uncertainty parameters that are independent with known support.Ben-Tal and Nemirovski [11] translate the existing stochastic uncertainties to 'uncertain-butbounded' sets under mild assumptions.Namely, a feasible solution x ∈ X for the tractable reformulation of the safe approximation satisfies the chance constraint (1) with at least 1 − ǫ probability where Z denotes a 'bounded' uncertainty set.
The accuracies of the obtained approximations in [10,11] are good when the number of uncertainty parameters is high.Associated approximations are famous because they yield tractable robust reformulations for (1) using modern robust optimization (RO) techniques.We shall delve into details of these RO methods in the next section.Chen et al. [12] propose a conservative approximation of a joint chance constraint in terms of a worst-case conditional value-at-risk (CVaR) constraint.The resulting approximation outperforms the Bonferroni approximation that is known to be pessimistic ( [10,[12][13][14]).Zymler et al. [14] propose a method for approximating joint chance constraints when the first-and second-order moments together with the support of the uncertainty parameter are known.Similar to the safe approximations methods that are mentioned above, extension of our approach to the joint chance constraint is also straightforward; later in §3, the associated extension shall be mentioned.Calafiore and Campi [15] and Campi and Garatti [16] substitute the chance constraint with a finite number of constraints which are randomly sampled from the original constraint according to the known distribution of the uncertainty parameter.The authors show that the resulting solution that comes out of the randomized approach fails to satisfy the chance constraint with the given confidence level provided that a sufficient number of samples is drawn.In practice, we usually have partial or no information on the probability distribution P, since it needs to be estimated from historical data.This is why it makes sense to pass to ambiguous chance constraint.The term 'ambiguous' stands for the uncertainty in the probability distribution.In other words, the distribution of the uncertainty parameters is itself uncertain ( [25]).The ambiguous chance constraint can be formulated as follows Pr ζ∼P {ζ : f (x, ζ) ≤ 0} ≥ 1 − ǫ ∀P ∈ P, (4) where P belongs to a family P of distributions and the chance constraint is satisfied for all (∀) probability distributions in P.This introduces an additional computational complexity in solving the problem aside from the existing difficulties that are mentioned above.To the best of our knowledge, there is no systematic and exact way of solving ambiguous chance constraint problems for general family classes with continuous distributions.Formulating ambiguity in the probability distribution has taken attention of scholars from different fields.In the absence of full information on the probability distribution or when only a set of possible distributions P is known, it is natural to optimize the expectation function corresponding to the worst-case probability distribution in P.This lead to the following formulation: For more details, we refer to [17][18][19][20].Moreover, ambiguity in the probability distribution is also addressed by [21][22][23] in economics.Ambiguity in the context of the chance constrained optimization has been studied by [10,24,25].It is important to point out that the ambiguous chance constraint is 'severely' intractable compared to the regular chance constraint.Good news is that, as it is pointed out by [26,27], robust reformulation methods for chance constraints can be straightforwardly extended to the ambiguous chance constraints and this is why the adjective 'ambiguous' is generally skipped.Erdogan and Iyengar [24] define the distribution family P in (4) using the Prohorov metric.The authors propose a robust sampled problem that is a good approximation for the associated ambiguous chance constrained problem with high probability.Yanıkoglu et al. [25] propose an iterative algorithm that constructs the uncertainty set yielding a tractable robust counterpart that safely approximates the ambiguous chance constraint; the authors use the φ-divergence metric and historical data on the uncertainty parameters to define the distribution family P.For further details on such approximations, we refer reader to [25][26][27].
In this paper, we focus on robust reformulations of ambiguous chances constraints with discrete probability distributions.More precisely, P corresponds to a discrete probability distribution, i.e., we have a finite set of scenarios for uncertainty realizations {ζ Finite supports are often faced in practice when data at hand is discrete, some examples are, demand, the number of customers in a queue, the amount of inventory in a production facility, the time and quantity of returning products in a reverse logistics network, the number of quality grades in remanufacturing and so on.On the other hand, empirical distributions are often used when the data at hand is scarce so that no continuous distribution can be fitted; or when the information is based on an expert opinion.Last but not least, continuous supports may be reduced to finite ones in order to yield computational tractability for the problem at hand (e.g., see [28]).When ζ follows a discrete probability distribution, the chance constraint (1) can be equivalently reformulated as where binary variable for realization ζ s or 0 otherwise, and M denotes a large constant; also see [29,30] that adopt the associated big-M reformulation technique and propose branch-and-bound/cut solution approach to chance constrained problems under finite support.As it is pointed out above, the probability distribution P that has to be known is often not (exactly) known in practice; the probability vector p ∈ [0, 1] |S| of the discrete uncertainty parameter ζ ∈ R L often comes from an expert opinion or a forecast, i.e., the probability vector that determines the structure of the uncertainty is in itself uncertain.This is why, working with socalled a nominal probability vector may cause a lot of problems if the associated ambiguity in the discrete probability distribution is not taken into account.The ambiguous chance constraint (4) can be equivalently reformulated as the following semi-infinite mixed-integer problem when P follows a discrete probability distribution.
where U denotes the ambiguity (or uncertainty) set that supports the family of distributions P in the ambiguous chance constraint.As pointed out above, ( 5) is a semi-infinite optimization problem, i.e., it has finitely many decision variables and infinitely many constraints (see, ∀p ∈ U ) that is intractable in its current form.Using the robust optimization paradigm, the tractable reformulations of ( 5) shall be proposed in this paper.The associated approach is exact, i.e., it equivalently reformulates the ambiguous chance constraint by exploiting the deterministic structure of the distribution, when the random perturbation in p is independent; and we propose safe approximations when p is dependent.Hanasusanto et al. [31] derive explicit conic representations of ambiguous chance constraints for tractable classes and efficiently solvable conservative approximations for the intractable ones using tools from distributionally robust optimization (DRO).The authors, derive tractable reformulations of the ambiguous individual chance constraint when the ambiguity set is Markov.For the joint case, the tractability is obtained only for conic moment ambiguity sets.The authors propose a conservative approximation algorithm for the intractable cases which is based on improving the fixed decision at each stage of the algorithm.Hanasusanto et al. [32] study conic representable reformulations of ambiguous joint chance constraints when the mean and the support, and the upper bound on the dispersion of the uncertainty parameters are known.The authors also provide the conic representable reformulation of the optimistic chance constraint for specific classes.Jiang and Guan [33] study distributionally robust chance constraints when the family of distributions is based on a phi-divergence measure.The problems are efficiently solvable by using strong cutting planes and hence a branch-and-cut algorithm.Chen et al. [34] propose distributionally robust reformulations of data-driven chance constraints using the Wasserstein ball uncertainty set that is often used in DRO framework.The resulting RC is mixed-integer linear program that can be solved for moderate sized instances.The method can also be extended to joint chance constraints when the uncertainty is in the righthand side (RHS).Similarly, Ji and Lejeune [35] study distributionally robust chance-constrained programming with data-driven Wasserstein ambiguity sets.They strengthen the formulations by adopting valid inequalities.The resulting RC is a mixed-integer second-order cone programing (MISOCP) reformulation for the exact model with RHS uncertainty.The authors also propose a MISOCP relaxation for models with random technology vector.Zhang et al. [36] study distributionally robust reformulation of chanceconstrained bin packing problem.Using two moments of the uncertainty, they construct the ambiguity set and the resulting robust reformulation of the problem.To strengthen second-order RC formulation, the authors adopt valid polymatroid inequalities that improves the computational performance the off-the-self solvers such as GUROBI and CPLEX for the given test instances.Cheng et al. [37] consider distributionally robust version of the quadratic knapsack problem where the knapsack constraint coincides with an ambiguous chance constraint such that the two moments of the distributions in the ambiguity sets are partly known.The resulting RC is a semidefinite programming (SDP) relaxation reformulation.The joint case is more challenging and the authors propose two tractable methods to find upper and lower bounds for the SDP relaxation.Hu and Hong [38] propose a DRO framework that adopts the Kullback-Leibler phi-diverge measure to model the ambiguity set of the unknown probability distribution.The authors show that the associated approach can also be used to reformulate ambiguous chances constraints when the confidence level of the phi-divergence measure is set to the probability threshold of the chance constraint.Xie and Ahmed [39] propose robust reformulations of ambiguous chance constrained problems when the ambiguity set is specified by convex moment constraints.They show that distributionally robust chance constrained problem can be modeled as a convex optimization problem under certain conditions.Xie et al. [40] propose a Bonferronni approximation approach to solve distributionally robust joint chance constrained programs.The author shows that the associated approximation is convex and tractable when the family of distributions is specified by moments or by marginal distributions.Xie [41] studies exact and approximate reformulations of distributionally robust chance constrained problems where the ambiguity set is determined by Wasserstein metric.The author adopts a branch and cut algorithm to solve the resulting reformulations.Finite dimensional uncertainty parameters are often faced in regression analysis and worst-case expectation problems with Wasserstein ambiguity sets often yield efficient second-order conic formulation; we refer reader to [42] for distributionally robust regression.Similarly, Özmen et al. [43] robustify an extension of multivariate adaptive regression splines (MARS) under polyhedral uncertainty (i.e., so-called RCMARS) by adopting RO; we refer reader to [44] for MARS and to [45] for its deterministic extension, i.e., so-called CMARS.
For robust reformulations of various risk measures that are often used in financial optimization, we refer to [12,[46][47][48], and references therein.The remainder of the paper is organized as follows.In §2, we first give a brief introduction to the RO paradigm ( §2.1), then we propose the robust reformulations of the ambiguous chance constraints with discrete probability distributions ( §2.2).In §2.3, we propose a safe approximation algorithm for the robust reformulations.In §3, we present extensions of our approach.In §4, we present numerical experiments.Finally, we give our concluding remarks and future research directions in §5.
Notation.Bold-face, lower-case letters and numbers represent vectors, e.g., 0 denotes a vector of zeros and e is the all-one vector.Bold-face, uppercase letters represent matrices.The dimensions of the vectors and matrices will usually be clear from the context.Lower-case letters refer to vector elements, e.g., p s denotes the s th element of vector p.A vector or matrix superscript indicates either the transpose (⊤) or the element-wise power of a given vector or matrix, e.g., q −2 and Q 2 ; only exceptions are indexes of probability vectors (p 0 , p (j) ) and uncertainty parameter realizations (ζ s ).P denotes a probability distribution and P denotes a family of distributions.The uncertainty set is given by Z where ζ ∈ Z ⊆ R L denotes the uncertainty parameter and L is the number of uncertain parameters.The ambiguity set is given by U where ξ ∈ U ⊆ R |S| denotes the perturbation vector of the probability vector p ∈ [0, 1] |S| and S denotes the set of indexes for the uncertainty parameter realizations, i.e., the components of the probability vector.The subscripts B and E for the ambiguity set denote the specific properties of the set, namely, p-box (U B ) and p-ellipsoidal (U E ).Finally, ". .." denotes that the remainder of an expression shall continue in the next line.

Methodology
In this section, we first present the three core steps of the RO paradigm.Later we adopt RO to ambiguous chance constraints with discrete probability distributions and we to derive the tractable robust reformulations of the ambiguous chance constraints.

Introduction to robust optimization
For the sake of simplicity, we assume that LHS of the constraint where Next, we present a three step procedure to remove the universal quantifier (∀) in ( 6); the resulting deterministic equivalent reformulation is often referred to as the 'tractable' robust counterpart (RC).
Step 1: The worst-case formulation ( 6) can be equivalently reformulated as We now have a nested optimization problem because we have a maximization inside the constraint.Next, we take the dual the inner maximization problem.
Step 2: Note that by strong duality Hence ( 7) is equivalent to Step 3: We can remove the minimization in ( 8) because a feasible solution for (8) yields an upper bound for the maximization problem by weak duality (i.e., equivalent at optimality by strong duality because the problem is convex).
Finally, we obtain the tractable RC of the initial semi-infinite problem (6).Table 1 presents the RCs of ( 6) with respect to a class of uncertainty sets.The only difference is that the type of duality in Step 3 changes according to the uncertainty set at hand, e.g., when the uncertainty set is a cone, we use conic duality to obtain the RC; see, the third row of the Table 1.
Table 1.The tractable robust counterparts of uncertain linear constraints

Conic
Notice that the three step procedure shall be similarly applied in our setting to derive the tractable RC reformulations of the semi-infinite probability threshold constraint: for the two different classes of the ambiguity set U .We refer reader to [26,49] and the references therein for further details on the RO paradigm.

Robust reformulation of ambiguous chance constraint
In this section, we focus on the robust reformulations of the ambiguous chance constraints for two classes of uncertainty sets, namely, 'p-box' and 'p-ellipsoidal' uncertainty sets that are specifically created for the uncertain probability vector p ∈ {0, 1} |S| that resides in the plane sections of box and ellipsoidal regions.To this end, we shall develop the tractable RC reformulations of the semi-infinite probability constraint (SI) in The formal representations of p-box (U B ) and pellipsoidal (U E ) ambiguity sets are given as and where p 0 denotes the nominal probability vector given that p 0 ≥ 0 and s∈S p 0 s = 1; ξ is the uncertainty vector that determines the dispersion from the nominal probability vector; ℓ and u are the upper and lower bound vectors for the p-box; σ denotes the radius (or the worst-case dispersion from the nominal vector p 0 ) in the p-ellipsoidal, Q ∈ R |S|×|S| (:= diag(q)) is a diagonal matrix where vector q ∈ R |S| contains the scaling parameters that determine the shape of the ellipsoid, e.g., when q = e, the resulting uncertainty set is a ball.Notice that the sum of the elements of the uncertainty vector ξ is zero (i.e., [ s∈S ξ s = 0]) in order to guarantee that the sum of the elements of the unknown (or ambiguous) probability vector to be 1 (i.e., [ s∈S p s = 1]).Independent versus dependent data.In some cases, it may be known that the uncertainty parameters ζ s j are independent for j ∈ {1, . . ., L}.If this is the case, then the joint probability p s of the uncertainty vector [ζ s 1 , ζ s 2 , . . ., ζ s L ] is being realized is equivalent to the product of the marginal probabilities of the uncertainty parameters for the given scenario, i.e., [p s = p L ].This results in complex ambiguity sets.We shall delve into details of such reformulations later in §3.3.
In the remainder, for the sake of simplicity, we assume that the elements of the ambiguous probability vector p are independent and the uncertainty parameter (ζ s ) is realized as a vector.Other assumptions are listed below.Assumptions.(A1) We assume that f (x, ζ) is linear in the decision variable x (extended in §3.2). (A2) Without loss of generality, we assume that U B,E ⊆ [0, 1] |S| , i.e., the parameters ℓ, u, Q, and σ are selected such that [p ≥ 0 ∀p ∈ U B,E ] is satisfied.
Selecting the parameters of p-ellipsoidal uncertainty set.The standard form of the ellipsoid inequality is where α i denotes the half length of the i th principal semiaxis of the ellipsoid, i.e., α i > 0 for all i ∈ {1, . . ..|S|}.Let parameter "γ i ∈ (0, 1] |S| " be the percentage of dispersion [(1−γ i )p 0 i , (1+γ i )p 0 i ] from the nominal probability p 0 i for i = {1, . . ., |S|}, then we may set the half length α i to the associated dispersion, i.e., α i = γ i p 0 i for all i ∈ {1, . . ., |S|}.Eventually, the p-ellipsoidal uncertainty set U E that satisfies (A2) can be generated by setting σ = 1 and where Q = diag(q).Notice that one can also set σ and Q using other methods as long as (A2) is satisfied; here we present a systematic method that can be easily implemented in practice.
Theorem 1 and 2 provide the tractable RCs of (SI) for the p-box and p-ellipsoidal uncertainty sets; respectively.
if and only if y, θ, w and λ satisfy the following RC problem: where e is the all-one vector, and w, θ, and λ denote the dual variables.
Proof.As in Step 1 of the three step procedure in §2.1, (SI B ) can be equivalently reformulated as The above expression is equivalent to the following min Notice that y ⊤ p 0 is taken out of the minimization because it is a fixed term for given y.Let us now focus on the minimization problem in the constraint LHS: where each term in the parentheses denotes the dual variable of the associated constraint.
Next we take the dual of the problem as in Step 2: Finally, we remove the max imization and obtain the tractable reformulation of (SI B ) Consequently, the tractable RC of the robust reformulation of the ambiguous chance constraint (5) with respect to the p-box uncertainty set be- The resulting problem (RC B ) is a mixed-integer linear problem (MILP) that can be solved by commercial solvers (e.g., CPLEX) for moderate sized instances; given that there exists a linear objective function [c ⊤ x] that we either minimize or maximize.
Similarly, Theorem 2 presents the tractable RC of (SI) with respect to the p-ellipsoidal uncertainty set.
Theorem 2. The vector y ∈ {0, if and only if y, β, and λ satisfy the following RC problem: where e is the all-one vector, y λ and y β are the auxiliary variables to linearize λy and βy; respectively, β and λ are the dual variables, and ε is a small positive constant.
Proof.Similar to the proof of Theorem 1, (SI E ) can be reformulated as The Lagrangian dual problem of the minimization problem in the constraint LHS is given by max β≥0, λ where β ≥ 0 and λ denote the dual variables for the p-ellipsoidal constraints [||Qξ|| 2 2 ≤ σ 2 ] and [ s∈S ξ s = 0]; respectively.From the firstorder condition, we have [y + 2βQ Consequently, we substitute ξ * for ξ in the dual problem.Next, we can remove the max imization as in Step 3 and the LHS becomes Next we multiply the both sides of the inequality by β (> 0): which results in the following inequality 0.25 y 2 + λ 2 e + 2λy ⊤ q −2 + . . .
Notice that since y is a vector of binary variables, y ⊤ y is equivalent to y ⊤ e, and Q = diag(q).Moreover, we introduce a vector of auxiliary variables y β ∈ R |S| + (:= βy) and the following group of constraints to linearize βy: where M is a large constant.Similarly, we define y λ ∈ R |S| (:= λy) and the following constraints to linearize λy: The final formulation of the RC is obtained after the above mentioned substitutions are made and the additional constraints are included.
Eventually, the tractable RC of (5) with respect to the p-ellipsoidal uncertainty set becomes (RC E ) Notice that (RC E ) is a mixed-integer secondorder cone problem (MISOCP) that yields a convex relaxation problem; SOCP is a tractable mathematical optimization class and supported by commercial solvers such as CPLEX.
Remark 1.We assume that M is large enough to guarantee that big-M constraints are equivalent to the constraints in the original formulation.
On the other hand, they are also small enough to avoid excessive branching, i.e., they must be systematically generated in the numerical experiments.
Remark 2. The second constraint in (RC E ) can be equivalently reformulated as the following second-order cone constraint: In summary, using Theorem 1 and 2, one may obtain exact robust reformulations of unique ambiguous chance constraints with discrete probability distributions when the unknown scenario probabilities [p 1 , p 2 , . . .p |S| ] are independent.The mathematical optimization complexity of the resulting RC is MILP for the p-box and is MIS-OCP for the p-ellipsoidal uncertainty set given that f is linear in the decision variable x ∈ X.
In § §3.1-3.3,we shall relax some of these properties in order to tackle with more general classes of ambiguous chance constraints.Good news is that our approach straightforwardly extends for these new classes even though it is not necessarily exact but a safe approximation method when the elements of the probability vector are dependent.

Safe Approximation Algorithm
Notice that when the number of scenarios (|S|) increases, (RC B ) and (RC E ) may be challenging mathematical optimization problems to be solved in a given time limit.This is why, in this section, we present an efficient iterative algorithm that shall yield a safe approximation for ambiguous chance constraints with discrete probability distributions.We illustrate our approach using the following (implicit) notation of the unique ambiguous chance constrained problem: where 1 − ǫ is the prescribed probability, and ζ is the primitive uncertainty vector.The four main steps of the algorithm are explained in detail below.
Step 1. Derive the RC of the problem at hand where It is important to point out that the associated RC yields SOCP when the f is linear in x and ζ; and it is also tractable for the nonlinear case when f is concave in ζ for any given x (see, [50]).
Step 2. Next, we find the uncertainty parameter realizations that are satisfied for the optimal solution x * of (10).Namely, Step 3. Fix y = y * and solve (RC B ) if the ambiguity set of the uncertain probability vector p is the p-box; or solve (RC E ) if the associated set is the p-ellipsoidal.If there exists a feasible solution for the RC at hand (for given y = y * ) then we terminate the algorithm and record x * as the final solution of the algorithm.Else, we go to Step 4.
Step 4. We increase Ω by the step size ω and go to Step 1.The table representation of the so-called stepwise ellipsoidal algorithm (SEA) that safely approximates the ambiguous chance constrained problem is given below.
Step 1: Solve the robust counterpart of the given problem with respect to the uncertainty set ||ζ||2 ≤ Ω and find the optimal solution x * .

Step 2:
Check the satisfied scenarios by x * : Record these scenarios to vector y * .
If (RCE) yields a feasible solution then terminate the algorithm.
Else go to Step 4.
of Algorithm 1. 1) The associated algorithm can be easily extended to joint chance constraints and non-linear uncertain inequalities as long as the original problem is tractable, i.e., the algorithm does not introduce additional mathematical optimization complexity to that of the original problem.2) One may also adopt a box uncertainty set in Step 1.We prefer using the ellipsoidal uncertainty set while we are generating our ambiguity set because it yields more diversified solution to the RC compared with the box uncertainty set; also see, [51].3) If the ambiguity set is p-box, in Step 3 we solve (RC B ) instead of (RC E ).Complexity. 1) It is important to point that the mathematical optimization complexities of (RC B ) and (RC E ) in Step 3 are reduced to LP and SOCP; respectively, because we fix the binary variable y to y * that comes out of Step 2.
2) In an L-dimensional uncertainty space, the radius of the ellipsoid (or ball) Ω can be at most √ L since the ball shall be larger than the support of the uncertainty parameter ζ, i.e., given by ζ ∈ [−1, 1] L , when it gets larger than √ L. Therefore, Ω is updated in at most O(ω −1 √ L) iterations of the algorithm.In addition, we need the check |S| scenarios at each iteration of the algorithm in Step 2. Eventually, the total number of iterations of the algorithm is bounded above by O(ω −1 √ L|S|).

Extensions
In this section, we present extensions of our approach to joint chance constraints, nonlinear inequalities, and dependent probability vectors.It is important to stress that these extensions are not straightforward in general, nevertheless, our approach can be easily extended to the joint chance constraint with slight modification and it is unaffected from nonlinear inequalities, and we propose safe approximations for the dependent case.

Joint chance constraints
If we have more than one uncertain constraint and aim to find a solution that jointly satisfies these constraints with at least probability 1 − ǫ, then we have the following joint (ambiguous) chance constrained formulation: Pr ζ∼P {ζ : where K denotes the set of constraint indexes.Unlike classic approaches, our approach straightforwardly extends to a joint chance joint chance constraint.More precisely, one can equivalently reformulate (11) as The tractable RC of (12) when U is the p-box uncertainty set is given in Corollary 1.
Corollary 1.The vectors x ∈ X and y ∈ {0, 1} |S| satisfy if and only if x, y, θ, w, and λ satisfy the following RC problem: where w, θ, and λ denote the dual variables.
Proof.The first group of constraints [ will not be affected while we are deriving the RC.The tractable RC of the semi-infinite constraint The mathematical optimization complexity of the robust reformulation of the joint chance constraint (jRC B ) is the same with that of (RC B ); RC of (11) for the p-ellipsoidal ambiguity set can be similarly derived by adopting Theorem 2.

Nonlinear inequalities
The mathematical optimization complexity of our approach is unaffected when we have nonlinearity in the uncertainty parameter ζ, e.g., when f is linear in x.In this case, optimization complexity of the resulting RCs for the p-box and the p-ellipsoidal ambiguity sets are MILP and MISOCP; respectively, as in (RC B ) and (RC E ).
On the other hand, when f is nonlinear in x, the structure of nonlinearity in x determines the problem complexity, i.e., (SI) does not introduce additional mathematical optimization complexity to that of the original problem.

Dependent probability vector
In the previous reformulations, we have assumed that the elements of the ambiguous joint probability vector p that reside in the ambiguity set U to be independent, e.g., the off-diagonal elements of Q are 0 in p-ellipsoidal ambiguity set.Nevertheless, the probabilities in the ambiguity set U may be dependent in two ways: (D1) probabilities [p 1 , p2 , . . ., p|S| ] of some realizations {ζ (1) , ζ (2) , . . ., ζ (|S|) } are correlated (e.g., when the probability of high demand increases in period t, the probability of high demand may increase or decrease in period t + 1), or (D2) uncertainty parameters ζ i are independent for i = {1, . . ., L} and the marginal probabilities follow ambiguous discrete probability distribution.
The dependency in (D1) can be easily formulated by setting the value of Q such that (Q ⊤ Q) −1 corresponds to the covariance matrix in the pellipsoidal ambiguity set and the RC may be derived similar to Theorem 2; for the sake of brevity, it is skipped.However, the latter dependency (D2) is somewhat more complex, and we propose safe approximations for the associated case in the remainder of this section.Notice that, when the uncertainty parameters are independent as in (D2), the joint probability is equivalent to the product of the individual probabilities, i.e., probabilities of different uncertainty realizations become dependent because joint probabilities [p 1 , p2 , . . ., p|S| ] share common or complementary terms.Extension of the semi-infinite representation (5) of the ambiguous chance constraint for (D2) is given as follows B,E , j ∈ {1, 2, . . ., L}, where an aggregate scenario is indexed by (i 1 , i 2 . . ., i L ) for i j ∈ S j for all j ∈ {1, . . ., L}; S j denotes the set of indexes (i j ) for possible realizations of the j th uncertainty parameter ζ (j) i j , i.e., an aggregate scenario set S is equivalent to the Cartesian product of the individual sets, i.e., S := S 1 × S 2 . . .× S L ; an aggregate uncertainty vector i L ] is a collection of uncertainty parameter realizations at scenario (i 1 , i 2 . . ., i L ); p (j) i j denotes the probability of the i th j realization of the j th uncertainty parameter ζ (j) i j is being realized; ξ (j) i j denotes the random perturbation in probability p (j) i j where U (j)

B and U (j)
E are the associated uncertainty sets; Y i 1 ,i 2 ...,i L is a 0-1 variable that takes value 1 if the solution x satisfies the constraint for the associated scenario or 0 otherwise.The formal representations of the p-box and p-ellipsoidal ambiguity sets for probability distributions of the independent uncertainty parameters are as follows.
The probability that the scenario (i 1 , i 2 . . ., i L ), i.e., [ζ i L ], is being realized is equivalent to the product of the individual probabilities: Eventually, the semi-infinite probability threshold constraint for the joint distribution can be reformulated as the following mathematical optimization problem for the p-box uncertainty set.
(IB) min P ,p (.)  (i 1 ,i 2 ,...,i L )∈S where ( 13) formulates the joint probability as an equality constraint for the ease of exposition, (14) ensures that probability of each independent uncertainty parameter is summed to 1, and (15) determines the bounds of the box uncertainty sets, and [val(IB)≥ 1 − ǫ].It is easy to see that (IB) is nonconvex and highly nonlinear because of ( 13), and this is why it cannot be dualized in its current form.The following theorem relaxes the nonlinear structure of (IB) and provides a lower bound for val(IB).
Proof.Let (P , p (1) , . . ., p (ℓ) ) be a feasible solution of (IB).If we prove that P ∈ R |S 1 |×|S 2 |...×|S L | is feasible for (rIB), then we can conclude that (rIB) is a reduced relaxation of (IB).To begin with, let p(j) plies that i j ∈S i j p(j) i j = 1 for all j ∈ {1, . . ., L} given that 0 ≤ p(j) ≤ 1 for all i j ∈ S i j for all j ∈ {1, . . ., L} (by Assumption 2).Notice that, probability P i 1 ,i 2 ...,i L is the product of independent probabilities p(j) , i.e., the summation of the elements of P over all combinations of individual probabilities is equivalent to 1. Constraints ( 13) and ( 14) imply that for all i j ∈ S i j for all j ∈ {1, . . ., L}.As a result, P of (IB) satisfies all the constraints in (rIB).
Next, the semi-infinite probability threshold constraint for the joint distribution can be reformulated as the following mathematical optimization problem for the p-ellipsoidal uncertainty set.
(IE) min P ,p (.)  (i 1 ,i 2 ,...,i L )∈S Similarly, (IE) is a nonlinear and non-convex optimization problem that is intractable in its current form.The following theorem yields a tractable reduced relaxation of (IE).
Proof.Constraints ( 17) and (18) imply that the largest value that ξ (j) i j can get is σ j /q (j) i j for all i j ∈ S i j for all j ∈ {1, . . ., L}; similarly, the smallest value is −σ j /q (j) i j .The remainder of the proof follows similar to Theorem 3. Remark 3. ambiguity set U of the unknown true probability vector p the independent uncertainty parameters may also be defined by first calculating the nominal joint probabilities of the uncertainty parameter realizations and then adopting the p-box or p-ellipsoidal ambiguity set over the nominal joint probability vector; notice that this shall result in the RCs that are proposed in §2.The extension that is presented in §3.2 defines the ambiguity sets with respect to the marginal probability vectors of the individual uncertainty parameters and propose safe approximations of for the resulting complex ambiguity set of the joint probability vector.

Numerical example: knapsack problem
In this section, we present validity or our approach using toy sized instances of an ambiguous chance constrained knapsack problem.We compare optimality and feasibility performances of the classic and the ambiguous chance constrained versions of the problem.We focus on a chance constrained knapsack problem: where v i denotes the value and ζ i denotes the uncertain weight of item i = {1, . . ., m}, x i is the decision variable that is 1 if item i is included in the knapsack or 0 otherwise, and W is the capacity of the knapsack.The uncertainty parameter ζ follows a discrete probability distribution P: where the true probability distribution p is unknown or ambiguous but is assumed to reside in a p-box (U B ) or a p-ellipsoidal (U E ) ambiguity set: and The p-box uncertainty set is symmetric around the nominal probability distribution p 0 and its size is determined by the scaling parameter γ ∈ [0, 1]; σ denotes the radius of the p-ellipsoidal uncertainty set that is again centered at the nominal data p 0 .Notice that the ambiguous chance constrained knapsack problem with a discrete probability distribution can be equivalently reformulated as The tractable RCs of the knapsack problem can be derived as in (RC B ) and (RC E ).
Illustrative example.We solve an instance of the knapsack problem with ten items (i.e., n = 10).Scenarios for the item weights (i.e., S = {1, . . ., 10}) are given in  Notice that when p = p 0 in the knapsack problem, we have the classic chance constrained version of the problem.The optimal solution and the objective function value for the classic problem are given Table 3.The optimal solution for the chance constrained knapsack problem satisfies the constraint with 1 − ǫ * = 0.775 probability.
Table 3. Optimal solution for the chance constrained problem Next we solve the ambiguous chance constrained knapsack problem with respect to the p-box (U B ) and p-ellipsoidal (U E ) ambiguity sets when γ = 0.4; respectively.Notice that σ is determined according to the tightest ball inside the box region, i.e., it is less conservative than the box.The optimal solutions are given in Table 4.
It is easy to see that optimal objective function value for the chance constrained problem that uses the nominal probability vector is greater than these of the ambiguous chance constrained version of the problem with respect to p-box and p-ellipsoidal ambiguity sets.To point out, the classic chance constraint approach is not immunized against the ambiguity in the probability distribution and this is why it yields a progressive objective function value.On the other hand, when the worst-case probability distribution p * is realized, the nominal solution cannot satisfy the given probability threshold 1 − ǫ = 0.75.More precisely, the LHS value for the probability threshold constraint [ s∈S p * s y s ≥ 1 − ǫ] is 0.65 for the p-box and 0.746 for the p-ellipsoidal uncertainty sets when y is the nominal optimal solution; compare these values with " s∈S p * s y s " column in Table 4.As it is anticipated, the classic approach outperforms the robust approach when the nominal probability distribution is used and the robust approach outperforms the classic approach when the worst-case distribution is used.Nevertheless, it may be known that both approaches are based on unique probability vector realizations, namely, the nominal and the worst-case, out of infinitely many possible options in the ambiguity set, i.e., technically both data points might never be realized.This is why, we also test the average feasibility performances of the nominal and robust solutions via Monte Carlo sampling.Even though, the robust approach does not require any distributional information, we assume in the experiment that probabilities are uniform between [(1−γ i )p 0 i , (1+γ i )p 0 i ] for the sampling.Algorithm 1 and Algorithm 2 are used to sample from p-box and p-ellipsoidal ambiguity sets.
We sample 1000 probability vectors from p-box and p-ellipsoidal uncertainty sets and the simulation outcomes are presented in Table 5 & 6.Table 5 reports the average LHS value of the probability threshold constraint [ 1000 i=1 s∈S p i s y s ] when y is fixed to the solution at hand.It is easy to see that the optimal solution for the p-box and the p-ellipsoidal uncertainty sets outperform the feasibility performance of the nominal solution.
As we have pointed out before the p-box uncertainty set is larger than the p-ellipsoidal and this is why its solution is immunized against uncertainty more than that for the p-ellipsoidal uncertainty set.If we compare the average feasibility performance of the p-box approach, we also see that its average performance is better than that of the p-ellipsoidal approach.It is important to point out that all approaches, namely, p-box, p-ellipsoidal, and nominal, yield (on average) feasible results that are (on average) far from being binding to the given probability threshold 1 − ǫ = 0.75.On the other hand, solutions are not feasible for all of the sampled instances.In Table 6, we report the average performances of the solutions for the violating instances.The first three columns in Table 6 denote the number of violating instances (out of 1000).E.g., the numerical result at "row = p-box" and "column = #Vio(ellip)" denotes the number of violated constraints when the solution is fixed to the optimal solution of the RC with respect to the p-box uncertainty set, i.e., the optimal solution of the ambiguous chance constraint for the p-box uncertainty is tested when the probability vectors are sampled from the p-ellipsoidal uncertainty set.The last three columns report the average LHS value of the probability threshold constraint for the violating instances.They show that the optimal solution of the randomized approach satisfies the following probabilistic guarantee: where the LHS of the inequality is associated with the probability that the chance constraint is violated for the solution at hand (x * N ) when N samples are drawn for an L-dimensional decision space, and the RHS coincides with the worst-case bound for the associated probability, e.g., when ǫ = 0.2, L = 5 and RHS = 0.01, then N = 54 samples have to be drawn.We refer reader to [15] for further details on RA; and to [16] for the formal proof on tightness of the given probability bound.Data used in the experiment is generated as follows: 1) we randomly sample the lower (ℓ i ) and upper (u i ) bounds of the intervals of the item weights from U [1][2][3][4][5][6][7][8][9][10] and U [11][12][13][14][15][16][17][18][19][20]; respectively, i.e., ζ i ∈ [ℓ i , u i ] ∀i ∈ L. 2) Next, we randomly sample |S| scenarios as vectors {ζ 1 , ζ 2 , . . ., ζ |S| } using the associated intervals.3) For the sake of simplicity, we assume that all scenarios are equally likely, i.e., p 0 s = 1/|S| for all s ∈ S. 4) Item values (v) are randomly sampled from U [10][11][12][13][14][15][16][17][18][19][20] and the bag capacity (W ) is set to %80 of the total nominal weights of the items.The confidence level [ L−1 i=0 Cb(N, i)ǫ i (1 − ǫ) N −i ] of RA is set to 0.01; the number of samples to be drawn for ǫ ∈ {0.1, 0.2}.5) L denotes the number of items and |S| denotes the number of scenario realizations used in discrete probability distribution.
For each L/|S| pair 20 different data sets have been generated by following the data generation structure mentioned above.Using the associated data, the optimality and CPU performances of RA, SEA (Algorithm 1) and the exact reformulation (RC E ) have been compared in Tables 7 &  8.
As we have pointed out above we have generated 20 instances for each L/|S| combination.The asterisk symbol ( * ) denotes that the associated algorithm solves all instances to optimality for the given parameter combination or it denotes the best performing algorithm when the exact reformulation cannot be solved.The tilde symbol (∼) denotes that optimality cannot be attained in 7200 seconds."Gap" denotes the (average) optimality gap percentage of an algorithm for the instances those cannot be solved to optimality or the (average) optimality gap percentage of an algorithm with respect to the best performing algorithm when the exact reformulation cannot be solved in 7200 seconds; the hash tag symbol (#) denotes the number of instances (out of 20) those cannot be solved to optimality (or worse than the best performing algorithm) by the given algorithm.Finally, CPU is the total computation time of a given algorithm in seconds.All experiments are run on a 64-bit Windows machine equipped with an Intel Quad-Core i7-6700HQ processor with 16 GB of RAM.The numerical results in Table 7 & 8 show that SEA outperforms RA at all instances both in CPU time and the optimality gap.Improved optimality performance of SEA may be an anticipated result since the algorithm systematically includes the uncertain parameters in the uncertainty set in order to yield the tightest ambiguity set and hence a better objective function value compared with RA that is fully randomized without a systematic objective performance consideration.Improved computational and optimality performances of SEA become more significant when probability threshold ǫ, and hence the size of the feasible region, increases; it is easy to see that when we compare the numerical results (Gap, CPU) in Tables 7 & 8. Notice that L/|S|=5/10 instances in Table 7 are kind of redundant since we need to satisfy all uncertainty realizations (i.e., robust) in this case to satisfy 1 − ǫ = 0.9 probability threshold.Nevertheless, these are the only instance where we see no significant difference among all alternative approaches.According to these numerical results, we can conclude that SEA is an efficient approximation algorithm to solve the ambiguous chance constrained problem for medium to large sized instances.SEA is almost as good as the exact reformulation (except for 1 instance in Table 7 at L/|S| = 10/20 with %1 gap) in optimality performance while yielding high quality solutions in less than a second.Moreover, SEA significantly outperforms RA in increased dimensions where we cannot adopt the exact approach (see rows L/|S| = 25/50 in Tables 8 & 9).

Conclusion
In this paper, we have proposed robust reformulations of ambiguous chance constraints with discrete probability distributions.We have derived the tractable robust counterparts of the associated class of ambiguous chance constraints using p-box and p-ellipsoidal uncertainty sets that support the ambiguous family of distributions.Our approach can be easily applied in practice where one works with a finite number of scenarios that follow a discrete probability distribution.The associated probabilities are ambiguous by nature because they are forecast or decided by an expert opinion.Proposed methodology aims to find a solution that satisfies the given probability requirement of being feasible while maximizing the utility (or minimizing the cost) and at the same time taking into account the distributional ambiguity.The proposed approach can be easily extended to joint chances constraints, nonlinear inequalities as well as different data structures without introducing additional mathematical optimization complexity to that of tackling a unique ambiguous chance constraint with discrete probability distribution using our method.
and the uncertainty parameter ζ resides in a polyhedral uncertainty set [U = Dζ + d ≥ 0], i.e., the constraint can be formulated as follows:

Table 2 .
Uncertainty parameter realizations with probabilities

Table 4 .
Optimal solutions for the ambiguous chance constrained problem

Table 5 .
Feasibility performance of solutions

Table 7 .
Comparison of solution approaches when ǫ = 0.1

Table 8 .
Comparison of solution approaches when ǫ = 0.2