Constrained Non-Monotone Submodular Maximization: Offline and Secretary Algorithms

Constrained submodular maximization problems have long been studied, with near-optimal results known under a variety of constraints when the submodular function is monotone. The case of non-monotone submodular maximization is less understood: the first approximation algorithms even for the unconstrainted setting were given by Feige et al. (FOCS '07). More recently, Lee et al. (STOC '09, APPROX '09) show how to approximately maximize non-monotone submodular functions when the constraints are given by the intersection of p matroid constraints; their algorithm is based on local-search procedures that consider p-swaps, and hence the running time may be n^Omega(p), implying their algorithm is polynomial-time only for constantly many matroids. In this paper, we give algorithms that work for p-independence systems (which generalize constraints given by the intersection of p matroids), where the running time is poly(n,p). Our algorithm essentially reduces the non-monotone maximization problem to multiple runs of the greedy algorithm previously used in the monotone case. Our idea of using existing algorithms for monotone functions to solve the non-monotone case also works for maximizing a submodular function with respect to a knapsack constraint: we get a simple greedy-based constant-factor approximation for this problem. With these simpler algorithms, we are able to adapt our approach to constrained non-monotone submodular maximization to the (online) secretary setting, where elements arrive one at a time in random order, and the algorithm must make irrevocable decisions about whether or not to select each element as it arrives. We give constant approximations in this secretary setting when the algorithm is constrained subject to a uniform matroid or a partition matroid, and give an O(log k) approximation when it is constrained by a general matroid of rank k.


Introduction
We present algorithms for maximizing (not necessarily monotone) non-negative submodular functions satisfying f (∅) = 0 under a variety of constraints considered earlier in the literature. Lee et al. [LMNS10,LSV09] gave the first algorithms for these problems via local-search algorithms: in this paper, we consider greedy approaches that have been successful for monotone submodular maximization, and show how these algorithms can be adapted very simply to non-monotone maximization as well. Using this idea, we show the following results: • We give an O(p)-approximation for maximizing submodular functions subject to a p-independence system.
This extends the result of Lee et al. [LMNS10,LSV09] which applied to constraints given by the intersection of p matroids, where p was a constant. (Intersections of p matroids give p-indep. systems, but the converse is not true.) Our greedy-based algorithm has a run-time polynomial in p, and hence gives the first polynomialtime algorithms for non-constant values of p.
• We give a constant-factor approximation for maximizing submodular functions subject to a knapsack constraint. This greedy-based algorithm gives an alternate approach to solve this problem; Lee et al. [LMNS10] gave LP-rounding-based algorithms that achieved a (5 + ǫ)-approximation algorithm for constraints given by the intersection of p knapsack constraints, where p is a constant.
Armed with simpler greedy algorithms for nonmonotone submodular maximization, we are able to perform constrained nonmonotone submodular maximization in several special cases in the secretary setting as well: when items arrive online in random order, and the algorithm must make irrevocable decisions as they arrive.
• We give an O(1)-approximation for maximizing submodular functions subject to a cardinality constraint and subject to a partition matroid. (Using a reduction of [BDG + 09], the latter implies O(1)-approximations to e.g., graphical matroids.) Our secretary algorithms are simple and efficient.
• We give an O(log k)-approximation for maximizing submodular functions subject to an arbitrary rank k matroid constraint. This matches the known bound for the matroid secretary problem, in which the function to be maximized is simply linear.
No prior results were known for submodular maximization in the secretary setting, even for monotone submodular maximization; there is some independent work, see §1.3.1 for details. Compared to previous offline results, we trade off small constant factors in our approximation ratios of our algorithms for exponential improvements in run time: maximizing nonmonotone submodular functions subject to (constant) p ≥ 2 matroid constraints currently has a ( p 2 p−1 + ǫ) approximation due to a paper of Lee, Sviridenko and Vondrák [LSV09], using an algorithm with run-time exponential in p. For p = 1 the best result is a 3.23approximation by Vondrák [Von09]. In contrast, our algorithms have run time only linear in p, but our approximation factors are worse by constant factors for the small values of p where previous results exist. We have not tried to optimize our constants, but it seems likely that matching, or improving on the previous results for constant p will need more than just choosing the parameters carefully. We leave such improvements as an open problem.

Submodular Maximization and Secretary Problems in an Economic Context
Submodular maximization and secretary problems have both been widely studied in their economic contexts. The problem of selecting a subset of people in a social network to maximize their influence in a viral marketing campaign can be modeled as a constrained submodular maximization problem [KKT03,MR07]. When costs are introduced, the influence minus the cost gives us non-monotone submodular maximization problems; prior to this work, online algorithms for non-monotone submodular maximization problems were not known. Asadpour et al. studied the problem of adaptive stochastic (monotone) submodular maximization with applications to budgeting and sensor placement [ANS08], and Agrawal et al. showed that the correlation gap of submodular functions was bounded by a constant using an elegant cost-sharing argument, and related this result to social welfare maximizing auctions [ADSY09]. Finally, secretary problems, in which elements arriving in random order must be selected so as to maximize some constrained objective function have well-known connections to online auctions [Kle05,BIK07,BIKK07,HKP04]. Our simpler offline algorithms allow us to generalize these results to give the first secretary algorithms capable of handling a non-monotone submodular objective function.

Our Main Ideas
At a high level, the simple yet crucial observation for the offline results is this: many of the previous algorithms and proofs for constrained monotone submodular maximization can be adapted to show that the set S produced by them satisfies f (S) ≥ βf (S ∪ C * ), for some 0 < β ≤ 1, and C * being an optimal solution. In the monotone case, the right hand side is at least f (C * ) = OPT and we are done. In the non-monotone case, we cannot do this. However, we observe that if f (S ∩ C * ) is a reasonable fraction of OPT, then (approximately) finding the most valuable set within S would give us a large value-and since we work with constraints that are downwards closed, finding such a set is just unconstrained maximization on f (·) restricted to S, for which Feige et al. [FMV07] give good algorithms! On the other hand, if f (S ∩ C * ) ≤ ǫOPT and f (S) is also too small, then one can show that deleting the elements in S and running the procedure again to find another set S ′ ⊆ Ω \ S with f (S ′ ) ≥ βf (S ′ ∩ (C * \ S)) would guarantee a good solution! Details for the specific problems appear in the following sections; we first consider the simplest cardinality constraint case in Section 2 to illustrate the general idea, and then give more general results in Sections 3.1 and 3.2.
For the secretary case where the elements arrive in random order, algorithms were not known for the monotone case either-the main complication being that we cannot run a greedy algorithm (since the elements are arriving randomly), and moreover the value of an incoming element depends on the previously chosen set of elements. Furthermore, to extend the results to the non-monotone case, one needs to avoid the local-search algorithms (which, in fact, motivated the above results), since these algorithms necessarily implement multiple passes over the input, while the secretary model only allows a single pass over it. The details on all these are given in Section 4.

Related Work
Monotone Submodular Maximization. The (offline) monotone submodular optimization problem has been long studied: Fisher, Nemhauser, and Wolsey [NWF78,FNW78] showed that the greedy and local-search algorithms give a (e/e − 1)-approximation with cardinality constraints, and a (p + 1)-approximation under p matroid constraints. In another line of work, [Jen76,KH78,HKJ80] showed that the greedy algorithm is a p-approximation for maximizing a modular (i.e., additive) function subject to a p-independence system. This proof extends to show a (p+1)-approximation for monotone submodular functions under the same constraints (see, e.g., [CCPV09]). A long standing open problem was to improve on these results; nothing better than a 2-approximation was known even for monotone maximization subject to a single partition matroid constraint. Calinescu et al. [CCPV07] showed how to maximize monotone submodular functions representable as weighted matroid rank functions subject to any matroid with an approximation ratio of (e/e − 1), and soon thereafter, Vondrák extended this result to all submodular functions [Von08]; these highly influential results appear jointly in [CCPV09]. Subsequently, Lee et al. [LSV09] give algorithms that beat the (p + 1)-bound for p matroid constraints with p ≥ 2 to get a ( p 2 p−1 + ǫ)-approximation. Knapsack constraints. Sviridenko [Svi04] extended results of Wolsey [Wol82] and Khuller et al. [KMN99] to show that a greedy-like algorithm with partial enumeration gives an (e/e − 1)-approximation to monotone submodular maximization subject to a knapsack constraint. Kulik et al. [KST09] showed that one could get essentially the same approximation subject to a constant number of knapsack constraints. Lee et al. [LMNS10] give a 5-approximation for the same problem in the non-monotone case.
Mixed Matroid-Knapsack Constraints. Chekuri et al. [CVZ09] give strong concentration results for dependent randomized rounding with many applications; one of these applications is a ((e/e − 1) − ǫ)-approximation for monotone maximization with respect to a matroid and any constant number of knapsack constraints. [GNR09, Section F.1] extends ideas from [CK05] to give polynomial-time algorithms with respect to non-monotone submodular maximization with respect to a p-system and q knapsacks: these algorithms achieve an p + q + O(1)-approximation for constant q (since the running time is n poly(q) ), or a (p + 2)(q + 1)-approximation for arbitrary q; at a high level, their idea is to "emulate" a knapsack constraint by a polynomial number of partition matroid constraints.
Non-Monotone Submodular Maximization. In the non-monotone case, even the unconstrained problem is NPhard (it captures max-cut). Feige, Mirrokni and Vondrák [FMV07] first gave constant-factor approximations for this problem. Lee et al. [LMNS10] gave the first approximation algorithms for constrained non-monotone maximization (subject to p matroid constraints, or p knapsack constraints); the approximation factors were improved by Lee et al. [LSV09]. The algorithms in the previous two papers are based on local-search with p-swaps and would take n Θ(p) time. Recent work by Vondrák [Von09] gives much further insight into the approximability of submodular maximization problems.
Secretary Problems. The original secretary problem seeks to maximize the probability of picking the element in a collection having the highest value, given that the elements are examined in random order [Dyn63,Fre83,Fer89]. The problem was used to model item-pricing problems by Hajiaghayi et al. [HKP04]. Kleinberg [Kle05] showed that the problem of maximizing a modular function subject to a cardinality constraint in the secretary setting admits a (1 + Θ(1) √ k )-approximation, where k is the cardinality. (We show that maximizing a submodular function subject to a cardinality constraint cannot be approximated to better than some universal constant, independent of the value of k.) Babaioff et al. [BIK07] wanted to maximize modular functions subject to matroid constraints, again in a secretarysetting, and gave constant-factor approximations for some special matroids, and an O(log k) approximation for general matroids having rank k. This line of research has seen several developments recently [BIKK07, DP08, KP09, BDG + 09].

Independent Work on Submodular Secretaries
Concurrently and independently of our work, Bobby Kleinberg has given an algorithm similar to that in §4.1 for monotone secretary submodular maximization under a cardinality constraint [Kle09]. Again independently, Bateni et al. consider the problem of non-monotone submodular maximization in the secretary setting [BHZ10]; they give a different O(1)-approximation subject to a cardinality constraint, an O(L log 2 k)-approximation subject to L matroid constraints, and an O(L)-approximation subject to L knapsack constraints in the secretary setting. While we do not consider multiple constraints, it is easy to extend our results to obtain O(L log k) and O(L) respectively using standard techniques.

Matroids.
A matroid is a pair M = (Ω, I ⊆ 2 Ω ), where I contains ∅, if A ∈ I and B ⊆ A then B ∈ I, and for every A, B ∈ I with |A| < |B|, there exists e ∈ B \ A such that A + e ∈ I. The sets in I are called independent, and the rank of a matroid is the size of any maximal independent set (base) in M. In a uniform matroid, I contains all subsets of size at most k. A partition matroid, we have groups g 1 , g 2 , . . . , g k ⊆ Ω with g i ∩ g j = ∅ and ∪ j g j = Ω; the independent sets are S ⊆ Ω such that |S ∩ g i | ≤ 1.
Unconstrained (Non-Monotone) Submodular Maximization. We use FMV α (S) to denote an approximation algorithm given by Feige, Mirrokni, and Vondrák [FMV07] for unconstrained submodular maximization in the nonmonotone setting: it returns a set T ⊆ S such that f (T ) ≥ 1 α max T ′ ⊆S f (T ′ ). In fact, Feige et al. present many such algorithms, the best approximation ratio among these is α = 2.5 via a local-search algorithm, the easiest is a 4-approximation that just returns a uniformly random subset of S.

Submodular Maximization subject to a Cardinality Constraint
We first give an offline algorithm for submodular maximization subject to a cardinality constraint: this illustrates our simple approach, upon which we build in the following sections. Formally, given a subset X ⊆ Ω and a non-negative submodular function f that is potentially non-monotone, but has f (∅) = 0. We want to approximate max S⊆X:|S|≤k f (S). The greedy algorithm starts with S ← ∅, and repeatedly picks an element e with maximum marginal value f S (e) until it has k elements.
Lemma 2.1. For any set |C| ≤ k, the greedy algorithm returns a set S that satisfies f (S) Since we ran the greedy algorithm, at each step this element e would have been a contender to be added, and by submodularity, e's marginal value would have been only higher then. Hence the elements actually added in each of the k steps would have had marginal value more than e's marginal value at that time, which is more than f (S)/k. This implies that f (S) > k · f (S)/k, a contradiction.
This theorem is existentially tight: observe that if the function f is just the cardinality function f (S) = |S|, and if S and C happen to be disjoint, then f (S) = 1 2 f (S ∪ C). Lemma 2.2 (Special Case of Claim 2.7 in [LMNS10]). Given sets C, S 1 ⊆ U , let C ′ = C \ S 1 , and S 2 ⊆ U \ S 1 .
Proof. By submodularity, it follows that f ( Putting these together and using non-negativity of f (·), the lemma follows.
1: let X 1 ← X 2: for i = 1 to 2 do 3: let X i+1 ← X i \ S i . 6: end for 7: return best of S 1 , S ′ 1 , S 2 . We now give our algorithm Submod-Max-Cardinality ( Figure 1) for submodular maximization: it has the same multi-pass structure as that of Lee et al., but uses the greedy analysis above instead of a local-search algorithm.
Proof. Let C * be the optimal solution with f (C * ) = OPT.
Using the known value of α = 2.5 from Feige et al. [FMV07], we get a 6.5-approximation for submodular maximization under cardinality constraints. While this is weaker than the 3.23-approximation of Vondrák [Von09], or even the 4-approximation we could get from Lee et al. [LMNS10] for this special case, the algorithm is faster, and the idea behind the improvement works in several other contexts, as we show in the following sections.

Fast Algorithms for p-Systems and Knapsacks
In this section, we show our greedy-style algorithms which achieve an O(p)-approximation for submodular maximization over p-systems, and a constant-factor approximation for submodular maximization over a knapsack. Due to space constraints, many proofs are deferred to the appendices.

Submodular Maximization for Independence Systems
Let Ω be a universe of elements and consider a collection I ⊆ 2 Ω of subsets of Ω. (Ω, I) is called an independence system if (a) ∅ ∈ I, and (b) if X ∈ I and Y ⊆ X, then Y ∈ I as well. The subsets in I are called independent; for any set S of elements, an inclusion-wise maximal independent set T of S is called a basis of S. For brevity, we say that T is a basis, if it is a basis of Ω.
Definition 3.1. Given an independence system (Ω, I) and a subset S ⊆ Ω. The rank r(S) is defined as the cardinality of the largest basis of S, and the lower rank ρ(S) is the cardinality of the smallest basis of S. The independence system is called a p-independence system (or a p-system) if max S⊆Ω See, e.g., [CCPV09] for a discussion of independence systems and their relationship to other families of constraints; it is useful to recall that intersections of p matroids form a p-independent system.

The Algorithm for p-Independence Systems
Suppose we are given an independence system (Ω, I), a subset X ⊆ Ω and a non-negative submodular function f that is potentially non-monotone, but has f (∅) = 0. We want to find (or at least approximate) max S⊆X:S∈I f (S). The greedy algorithm for this problem is what you would expect: start with the set S = ∅, and at each step pick an element e ∈ X \ S that maximizes f S (e) and ensures that S + e is also independent. If no such element exists, the algorithm terminates, else we set S ← S + e, and repeat. (Ideally, we would also check to see if f S (e) ≤ 0, and terminate at the first time this happens; we don't do that, and instead we add elements even when the marginal gain is negative until we cannot add any more elements without violating independence.) The proof of the following lemma appears in Section A, and closely follows that for the monotone case from [CCPV09].
Lemma 3.2. For a p-independence system, if S is the independent set returned by the greedy algorithm, then for any independent set C, f (S) ≥ 1 p+1 f (C ∪ S).
1: X 1 ← X 2: for i = 1 to p + 1 do 3: The algorithm Submod-Max-p-Systems ( Figure 2) for maximizing a non-monotone submodular function f with f (∅) = 0 over a p-independence system now immediately suggests itself.
Proof. Let C * be an optimal solution with OPT = f (C * ), and let C i = C * ∩ X i for all i ∈ [p + 1]-hence C 1 = C * . Note that C i is a feasible solution to the greedy optimization in Step 3. Hence, by Lemma 3.2, we know that to be chosen later), then the guarantees of FMV α ensure that f (S ′ i ) ≥ (ǫOPT)/α, and we will get a α/ǫ-approximation. Else, it holds for Now we can add all these inequalities, divide by p + 1, and use the argument from [LMNS10, Claim 2.7] to infer that (While Claim 2.7 of [LMNS10] is used in the context of a local-search algorithm, it uses just the submodularity of the function f , and the facts that we get the claimed approximation ratio.
Note that even using α = 1, our approximation factors differ from the ratios in Lee et al. [LMNS10,LSV09] by a small constant factor. However, the proof here is somewhat simpler and also works seamlessly for all pindependence systems instead of just intersections of matroids. Moreover our running time is only linear in the number of matroids, instead of being exponential as in the local-search: previously, no polynomial time algorithms were known for this problem if p was super-constant. Note that running the algorithm just twice instead of p + 1 times reduces the run-time further; we can then use Lemma 2.2 instead of the full power of [LMNS10, Claim 2.7], and hence the constants are slightly worse.

Submodular Maximization over Knapsacks
The paper of Sviridenko [Svi04] gives a greedy algorithm with partial enumeration that achieves a e e−1 -approximation for monotone submodular maximization with respect to a knapsack constraint. In particular, each element e ∈ X has a size c e , and we are given a bound B: the goal is to maximize f (S) over subsets S ⊆ X such that e∈S c e ≤ B. His algorithm is the following-for each possible subset S 0 ⊆ X of at most three elements, start with S 0 and iteratively include the element which maximizes the gain in the function value per unit size, and the resulting set still fits in the knapsack. (If none of the remaining elements gives a positive gain, or fit in the knapsack, stop.) Finally, from among these O(|X| 3 ) solutions, choose the best one-Sviridenko shows that in the monotone submodular case, this is an e e−1 -approximation algorithm. One can modify Sviridenko's algorithm and proof to show the following result for non-monotone submodular functions. (The details are in Appendix B).
Theorem 3.4. There is a polynomial-time algorithm that given the above input, outputs a polynomial sized collection of sets such that for any valid solution C, the collection contains a set S satisfying f (S) ≥ 1 2 f (S ∪ C).
Note that the tight example for cardinality constraints shows that we cannot hope to do better than a factor of 1/2. Now using an argument very similar to that in Theorem 2.3 gives us the following result for non-monotone submodular maximization with respect to a knapsack constraint.
Theorem 3.5. There is an (4 + α)-approximation for the problem of maximizing a submodular function with respect a knapsack constraint, where α is the approximation guarantee for unconstrained (non-monotone) submodular maximization.

Constrained Submodular Maximization in the Secretary Setting
In this section, we will give algorithms for submodular maximization in the secretary setting: first subject to a cardinality constraint, then with respect to a partition matroid, and finally an algorithm for general matroids. The main algorithmic concerns tackled in this section when developing secretary algorithms are: (a) previous algorithms for non-monotone maximization required local-search, which seems difficult in an online secretary setting, so we developed greedy-style algorithms; (b) we need multiple passes for non-monotone optimization, and while that can be achieved using randomization and running algorithms in parallel, these parallel runs of the algorithms may have correlations that we need to control (or better still, avoid); and of course (c) the marginal value function changes over the course of the algorithm's execution as we pick more elements-in the case of partition matroids, e.g., this ever-changing function creates several complications.
We also show an information theoretic lower bound: no secretary algorithm can approximately maximize a submodular function subject to a cardinality constraint k to a factor better than some universal constant greater than 1, independent of k (This is ignoring computational constraints, and so the computational inapproximability of offline submodular maximization does not apply). This is in contrast to the additive secretary problem, for which Kleinberg gives a secretary algorithm achieving a 1 1−5/ √ k -approximation [Kle05]. This lower bound is found in Appendix D. (For a discussion about independent work on submodular secretary problems, see §1.3.1.)

Subject to a Cardinality Constraint
The offline algorithm presented in Section 2 builds three potential solutions and chooses the best amongst them. We now want to build just one solution in an online fashion, so that elements arrive in random order, and when an element is added to the solution, it is never discarded subsequently. We first give an online algorithm that is given the optimal value OPT as input but where the elements can come in worst-case order (we call this an "online algorithm with advice"). Using sampling ideas we can estimate OPT, and hence use this advice-taking online algorithm in the secretary model where elements arrive in random order.
To get the advice-taking online algorithm, we make two changes. First, we do not use the greedy algorithm which selects elements of highest marginal utility, but instead use a threshold algorithm, which selects any element that has marginal utility above a certain threshold. Second, we will change Step 4 of Algorithm Submod-Max-Cardinality to use FMV 4 , which simply selects a random subset of the elements to get a 4-approximation to the unconstrained submodular maximization problem [FMV07]. The Threshold Algorithm with inputs (τ, k) simply selects each element as it appears if it has marginal utility at least τ , up to a maximum of k elements.
Proof. The claim is immediate if the algorithm picks k elements, so suppose it does not pick k elements, and also . By averaging, this implies there exists an element e ∈ C * such that f S (e) > τ ; this element cannot have been chosen into S (otherwise the marginal value would be 0), but it would have been chosen into S when it was considered by the algorithm (since at that time its marginal value would only have been higher). This gives the desired contradiction. Step 3, and to use the random sampling algorithm FMV 4 in Step 4, and return a (uniformly) random one of S 1 , S ′ 1 , S 2 in Step 7, the expected value of the returned set is at least OPT/21. Proof. We show that f (S 1 ) + f (S ′ 1 ) + f (S 2 ) ≥ τ k = OPT 7 , and picking a random one of these gets a third of that in expectation. Indeed, if S 1 or S 2 has k elements, then f ( Proof. We can randomly choose which one of S 1 , S ′ 1 , S 2 we want to output before observing any elements. Clearly S 1 can be determined online, as can S 2 by choosing any element that has high marginal value and is not chosen in S 1 . Moreover, S ′ 1 just selects elements from S 1 independently with probability 1/2. Finally, it will be convenient to recall Dynkin's algorithm: given a stream of n numbers randomly ordered, it samples the first 1/e fraction of the numbers and picks the next element that is larger than all elements in the sample.

The Secretary Algorithm for the Cardinality Case
Let Solution ← ∅. Flip a fair coin if heads then Solution ← most valuable item using Dynkin's-Algo else Let m ∈ B(n, 1/2) be a draw from the binomial distribution A 1 ← ρ off -approximate offline algorithm on the first m elements.
A 2 ← ρ on -approximate advice-taking online algorithm with f (A 1 ) as the guess for OPT. Return A 2 end if Figure 3: Algorithm SubmodularSecretaries For a constrained submodular optimization, if we are given (a) a ρ off -approximate offline algorithm, and also (b) a ρ onapproximate online advice-taking algorithm that works given an estimate of OPT, we can now get an algorithm in the secretary model thus: we use the offline algorithm to estimate OPT on the first half of the elements, and then run the advice-taking online algorithm with that estimate. The formal algorithm appears in Figure 3. Because of space constraints, we have deferred the proof of the following theorem to Appendix C.

Subject to a Partition Matroid Constraint
In this section, we give a constant-factor approximation for maximizing submodular functions subject to a partition matroid. Recall that in such a matroid, the universe is partitioned into k "groups", and the independent sets are those which contain at most one element from each group. To get a secretary-style algorithm for modular (additive) function maximization subject to a partition matroid, we can run Dynkin's algorithm on each group independently. However, if we have a submodular function, the marginal value of an element depends on the elements previously picked-and hence the marginal value of an element as seen by the online algorithm and the adversary become very different.
We first build some intuition by considering a simpler "contiguous partitions" model where all the elements of each group arrive together (in random order), but the groups of the partition are presented in some arbitrary order g 1 , g 2 , . . . , g r . We then go on to handle the case when all the elements indeed come in completely random order, using what is morally a reduction to the contiguous partitions case.

A Special Case: Contiguous Partitions
For the contiguous case, one can show that executing Dynkin's algorithm with the obvious marginal valuation function is a good algorithm: this is not immediate, since the valuation function changes as we pick some elementsbut it works out, since the groups come contiguously. Now, as in the previous section, one wants to run two parallel copies of this algorithm (with the second one picking elements from among those not picked by the first)-but the correlation causes the second algorithm to not see a random permutation any more! We get around this by coupling the two together as follows: Initially, the algorithm determines whether it is one of 3 different modes (A, B, or C) uniformly at random. The algorithm maintains a set of selected elements, initially S 0 . When group g i of the partition arrives, it runs Dynkin's secretary algorithm on the elements from this group using valuation function f S i−1 . If Dynkin's algorithm selects an element x, our algorithm flips a coin. If we are in modes A or B, we let S i ← S i−1 ∪ {x} if the coin is heads, and let S i ← S i−1 otherwise. If we are in mode C, we do the reverse, and let S i ← S i−1 ∪ {x} if the coin is tails, and let S i ← S i−1 otherwise. Finally, after the algorithm has completed, if we are in mode B, we discard each element of S r with probability 1/2. (Note that we can actually implement this step online, by 'marking' but not selecting elements with probability 1/2 when they arrive).
Lemma 4.6. The above algorithm is a (3 + 6e)-approximation for the submodular maximization problem under partition matroids, when each group of the partition comes as a contiguous segment.
Proof. We first analyze the case in which the algorithm is in mode A or C. Consider a hypothetical run of two versions of our algorithm simultaneously, one in mode A and one in mode C which share coins and produce sets S A r and S C r . The two algorithms run with identical marginal distributions, but are coupled such that whenever both algorithms attempt to select the same element (each with probability 1/2), we flip only one coin, so one succeeds while the other fails. Note that S C r ⊆ U \ S A r , and so we will be able to apply Lemma 2.2. For a fixed permutation π, let S A r (π) be the set chosen by the mode A algorithm for that particular permutation. As usual, we define , and taking expectations, we get Now, for any e ∈ X, let j(e) be the index of the group containing e; hence we have where the first inequality is just subadditivity, the second submodularity, the third follows from the fact that Dynkin's algorithm is an e-approximation for the secretary problem and selecting the element that Dynkin's selects with probability 1/2 gives a 2e approximation, and the resulting telescoping sum gives the fourth equality. Now substituting (7) into (6) and rearranging, we get E[f (S A r )] ≥ 1 1+2e f (S A r ∪ C * ). An identical analysis of the second hypothetical algorithm gives: . It remains to analyze the case in which the algorithm runs in mode B. In this case, the algorithm generates a set S B r by selecting each element in S A r uniformly at random. By the theorem of [FMV07], uniform random sampling achieves a 4-approximation to the problem of unconstrained submodular maximization. Therefore, we have in this case: Since our algorithm outputs one of these three sets uniformly at random, it gets a (3+6e) approximation to f (C * ).

General Case
We now consider the general secretary setting, in which the elements come in random order, not necessarily grouped by partition. Our previous approach will not work: we cannot simply run Dynkin's secretary algorithm on contiguous chunks of elements, because some elements may be blocked by our previous choices. We instead do something similar in spirit: we divide the elements up into k 'epochs', and attempt to select a single element from each. We treat every element that arrives before the current epoch as part of a sample, and according to the current valuation function at the beginning of an epoch, we select the first element that we encounter that has higher value than any element from its own partition group in the sample, so long as we have not already selected something from the same partition group. Our algorithm is as follows: Initially, the algorithm determines whether it is one of 3 different modes (A, B, or C) uniformly at random. The algorithm maintains a set of selected elements, initially S 0 , and observes the first N 0 ∼ B(n, 1 2 ) of the elements without selecting anything. The algorithm then considers k epochs, where the ith epoch is the set of N i ∼ B(n, 1 100k ) contiguous elements after the (i−1)th epoch. At epoch i, we use valuation function f S i−1 . If an element has higher value than any element from its own partition group that arrived earlier than epoch i, we flip a coin. If we are in modes A or B, we let S i ← S i−1 ∪ {x} if the coin is heads, and let S i ← S i−1 otherwise. If we are in mode C, we do the reverse, and let S i ← S i−1 ∪ {x} if the coin is tails, and let S i ← S i−1 otherwise. After all k epochs have passed, we ignore the remaining elements. Finally, after the algorithm has completed, if we are in mode B, we discard each element of S r with probability 1/2. (Note that we can actually implement this step online, by 'marking' but not selecting elements with probability 1/2 when they arrive).
If we were guaranteed to select an element in every epoch i that was the highest valued element according to f S i−1 , then the analysis of this algorithm would be identical to the analysis in the contiguous case. This is of course not the case. However, we prove a technical lemma that says that we are "close enough" to this case. Lemma 4.7. For all partition groups i and epochs j, the algorithm selects the highest element from group i (according to the valuation function f S j−1 used during epoch j) during epoch j with probability at least Ω( 1 k ). Because of space constraints, we defer the proof of this technical lemma to Appendix C. Note an immediate consequence of the above lemma: if e is the element selected from epoch j, by summing over the elements in the optimal set C * (1 from each of the k partition groups), we get: Summing over the expected contribution to S r from each of the k epochs and applying submodularity, we get . Using this derivation in place of inequality 7 in the proof of Lemma 4.6 proves that our algorithm gives an O(1) approximation to the non-monotone submodular maximization problem subject to a partition matroid constraint.

Subject to a General Matroid Constraint
We consider matroid constraints where the matroid is M = (Ω, I) with rank k. Let w 1 = max e∈Ω f ({e}) the maximum value obtained by any single element, and let e 1 be the element that achieves this maximum value. (Note that we do not know these values up-front in the secretary setting.) In this section, we first give an algorithm that gets a set of fairly high value given a threshold τ . We then show how to choose this threshold, assuming we know the value w 1 of the most valuable element, and why this implies an advice-taking online algorithm having a logarithmic approximation. Finally, we show how to implement this in a secretary framework.
A Threshold Algorithm. Given a value τ , run the following algorithm. Initialize S 1 , S 2 ← ∅. Go over the elements of the universe Ω in arbitrary order: when considering element e, add it to S 1 if f S 1 (e) ≥ ǫτ and S 1 ∪ {e} is independent, else add it to S 2 if f S 2 (e) ≥ ǫτ and S 2 ∪ {e} is independent, else discard it. (We will choose the value of ǫ later.) Finally, output a uniformly random one of S 1 or S 2 .
To analyze this algorithm, let C * be the optimal set with f (C * ) = OPT. Order the elements of C * by picking its elements greedily based on marginal values. Given τ > 0, let C * τ ⊆ C * be the elements whose marginal benefit was at least τ when added in this greedy order: note that f (C * τ ) ≥ |C * τ |τ .
Proof. If either |S 1 | or |S 2 | is at least |C * τ |/4, we get value at least |C * τ |/4 · ǫτ . Else both these sets have small cardinality. Since we are in a matroid, there must be a set A ⊆ C * τ of cardinality |A| ≥ |C * τ | − |S 1 | − |S 2 | ≥ |C * τ |/2, such that A is disjoint from both S 1 and S 2 , and both S 1 ∪ A and S 2 ∪ A lie in I (i.e., they are independent).
We claim that f (S 1 ) ≥ f (S 1 ∪ A) − |A| · ǫτ . Indeed, an element in e ∈ A was not added by the threshold algorithm; since it could be added while maintaining independence, it must have been discarded because the marginal value was less than ǫτ . Hence f S 1 ({e}) < ǫτ , and hence f ( And by disjointness, f (S 1 ∩ A) = f (∅) = 0. Hence, summing these and applying Lemma 2.2, we get that Since the marginal values of all the elements in C * τ were at least τ when they were added by the greedy ordering, and A ⊆ C * τ , submodularity implies that f (A) ≥ |A|τ , which in turn implies A random one of S 1 , S 2 gets half of that in expectation. Taking the minimum of |C * τ |/4 · ǫτ and (1 − 2ǫ)τ |C * τ |/2 and setting ǫ = 2/5, we get the claim.

Lemma 4.9.
log 2k Proof. Consider the greedy enumeration {e 1 , e 2 , . . . , e t } of C, and let w j = f {e 1 ,e 2 ,...,e i−1 } ({e j }). First consider an infinite summation ∞ i=0 |C * w 1 /2 i | · w 1 2 i -each element e j contributes at least w j /2 to it, and hence the summation is at least 1 2 j w j . But f (C * ) = t j=1 w j , which says the infinite sum is at least f (C * )/2 = OPT/2. But the finite sum merely drops a contribution of w 1 /4k from at most |C * | ≤ k elements, and clearly OPT is at least w 1 , so removing this contribution means the finite sum is at least OPT/4. Hence, if we choose a value τ uniformly from w 1 , w 1 /2, w 1 /4, . . . , w 1 /2k and run the above threshold algorithm with that setting of τ , we get that the expected value of the set output by the algorithm is: The Secretary Algorithm. The secretary algorithm for general matroids is the following: Sample half the elements, let W be the weight of the highest weight element in the first half. Choose a value i ∈ {0, 1, . . . , 2 + log 2k} uniformly at random. Run the threshold algorithm with W/2 i as the threshold Proof. With probability Θ(1/ log k), we choose the value i = 0. In this case, with constant probability the element with second-highest value comes in the first half, and the highest-value element e 1 comes in the second half; hence our (conditional) expected value in this case is at least w 1 . In case this single element accounts for more than half of the optimal value, we get Ω(OPT/ log k). We ignore the case i = 1. If we choose i ≥ 2, now with constant probability e 1 comes in the first half, implying that W = w 1 . Moreover, each element in C − e 1 appears in the second half with probability slightly higher than 1/2. Since e 1 accounts for at most half the optimal value, the expected optimal value in the second half is at least OPT/4. The above argument then ensures that we get value Ω(OPT/ log k) in expectation.
[Von08] Jan Vondrák. Optimal approximation for the submodular welfare problem in the value oracle model. In

A Proof of Main Lemma for p-Systems
Let e 1 , e 2 , . . . , e k be the elements added to S by greedy, and let S i be the first i elements in this order, with , which may be positive or negative. Since f (∅) = 0, we have f (S = S k ) = i δ i . And since f is submodular, δ i ≥ δ i+1 for all i.
Lemma A.1 (following [CCPV09]). For any independent set C, it holds that f (S k ) ≥ 1 p+1 f (C ∪ S k ). Proof. We show the existence of a partition of C into C 1 , C 2 , . . . , C k with the following two properties: Assuming such a partition, we can complete the proof thus: where the first inequality follows from [CCPV09, Claim A.1] (using the first property above, and that the δ's are non-increasing), the second from the second property of the partition of C, the third from subadditivity of f S k (·) (which is implied by the submodularity of f and applications of both facts in Proposition 1.1), and the fourth from the definition of f S k (·). Using the fact that i δ i = f (S k ), and rearranging, we get the lemma. Now to prove the existence of such a partition of C. Define A 0 , A 1 , . . . , A k as follows: Note that since C ∈ I, it follows that A 0 = C; since the independence system is closed under subsets, we have A i ⊆ A i−1 ; and since the greedy algorithm stops only when there are no more elements to add, we get A k = ∅. Defining C i = A i−1 \ A i ensures we have a partition C 1 , C 2 , . . . , C k of C.
Fix a value i. We claim that S i is a basis (a maximal independent set) for S i ∪(C 1 ∪C 2 ∪. . .∪C i ) = S i ∪(C \A i ). Clearly S i ∈ I by construction; moreover, any e ∈ (C \ A i ) \ S i was considered but not added to A i because S i + e ∈ I. Moreover, (C 1 ∪ C 2 ∪ . . . ∪ C i ) ⊆ C is clearly independent by subset-closure. Since I is a pindependence system, |C 1 ∪ C 2 ∪ . . . ∪ C i | ≤ p · |S i |, and thus i |C i | = i p i ≤ i · p, proving the first property.
For the second property, note that hence each e ∈ C i does not belong to S i−1 but could have been added to S i−1 whilst maintaining independence, and was considered by the greedy algorithm. Since greedy chose the e i maximizing the "gain", δ i ≥ f S i−1 ({e}) for each e ∈ C i . Summing over all e ∈ C i , we where the last inequality is by the subadditivity of f S i−1 . Again, by submodularity, Clearly, the greedy algorithm works no worse if we stop it when the best "gain" is negative, but the above proof does not use that fact.

B Proofs for Knapsack Constraints
The proof is similar to that in [Svi04] and the proof of Lemma 2.1. We use notation similar to [Svi04] for consistency. Let f be a non-negative submodular function with f (∅) = 0. Let I = [n], and we are given n items with weights c i ∈ Z + , and B ≥ 0; let F = {S ⊆ I | c(S) ≤ B, where c(S) = i∈S c i . Our goal to solve max S⊆F f (S). To that end, we want to prove the following result: Theorem B.1. There is a polynomial-time algorithm that outputs a collection of sets such that for any C ∈ F, the collection contains a set S satisfying f (S) ≥ 1 2 f (S ∪ C). 1

B.1 The Algorithm
The algorithm is the following: it constructs a polynomial number of solutions and chooses the best among them (and in case of ties, outputs the lexicographically smallest one of them).
• First, the family contains all solutions with cardinality 1, 2, 3: clearly, if |C| ≤ 3 then we will output C itself, which will satisfy the condition of the theorem.
• Now for each solution U ⊆ I of cardinality 3, we greedily extend it as follows: Set S 0 = U , I 0 = I. At step t, we have a partial solution S t−1 . Now compute Let the maximum be achieved on index i t . If θ t ≤ 0, terminate the algorithm. Else check if c( The family of sets we output is all sets of cardinality at most three, as well as for each greedy extension of a set of cardinality three, we output all the sets S t created during the run of the algorithm. Since each set can have at most n elements, we get O(n 4 ) sets output by the algorithm.

B.2 The Analysis
Let us assume that |C| = t > 3, and order C as j 1 , j 2 , . . . , j t such that i.e., index the elements in the order they would be considered by the greedy algorithm that picks items of maximum marginal value (and does not consider their weights c i ). Let Y = {j 1 , j 2 , j 3 }. Submodularity and the ordering of C gives us the following: Lemma B.2. For any j k ∈ C with k ≥ 4 and any Z ⊆ I \ {j 1 , j 2 , j 3 , j k }, it holds that: Summing the above three inequalities we get that for j k ∈ Y ∪ Z, For the rest of the discussion, consider the iteration of the algorithm which starts with S 0 = Y . For S such that

C Proofs from the Submodular Secretaries Section
In this section, we give the missing proofs from Section 4.

C.1 Proof for Cardinality Constrained Submodular Secretaries
Theorem 4.5 The algorithm for the cardinality-constrained submodular maximization problem in the secretary setting gives an O(1) approximation to OPT.
The proof basically shows that with reasonable probability, both the first and the second half of the stream have a reasonable fraction of OPT, so when we run the offline algorithm on the first half, using its output to extract value from the second half gives us a constant fraction of OPT.
Proof. Let C * = {e 1 , . . . , e k ′ } denote some set with k ′ ≤ k elements such that f (C * ) = OPT. Without loss of generality, we normalize so that OPT = 1. Suppose the elements of C * have been listed in the "greedy order" (i.e., in order of decreasing marginal utility), and let a i denote the marginal utility of e i when it is added to {e 1 , e 2 , . . . , e i−1 }. We consider two cases: in the first case, a 1 ≥ 1/c, where c ≥ 1 is some constant to be determined. In this case, with probability 1/2e, the algorithm runs Dynkin's secretary algorithm and selects a 1 , achieving an 1/(2ce) approximation.
In the other case, a i < 1/c for all i. We imagine randomly partitioning the elements of the input set X into two sets, X 1 and X 2 , with each element belonging to X 1 independently with probability 1/2. This corresponds to the algorithm's division of σ σ σ into the first (random) m elements σ σ σ m and the remaining elements σ σ σ − σ σ σ m . Let C * 1 and C * 2 denote the optimal solutions restricted to sets X 1 and X 2 respectively. Define the random variable We wish to lower bound min(f (C * 1 ), f (C * 2 )), and to do this it is sufficient to upper bound the absolute value |A|. To see this, suppose that, for some setting of the Y i 's it holds that i: Hence, we would like to upper bound |A| with high probability. Since each Y i is independent with expectation 0, we have By Chebyshev's inequality, for any d ≥ 0, we have That is, except with probability 1/d 2 , min(f (C * 1 ), f (C * 2 )) ≥ (1 − d √ c )/2. Now for some calculations. With probability 1/2, we do not run Dynkin's algorithm. Independently of this, with probability 1/2, f (C * 1 ) ≤ f (C * 2 )i.e., the value min(f (C * 1 ), f (C * 2 )) is achieved on σ σ σ m . With probability (1 − 1/d 2 ), this value is at least (1 − d √ c )/2. Now we run a ρ off -approximation on σ σ σ m , and thus with probability 1 4 (1 − 1/d 2 ), If we use this as a lower bound for f (C * 2 ) (which is fine since we are in the case where f (A 1 ) ≤ f (C * 1 ) ≤ f (C * 2 )), the semi-online algorithm gives us a value of at least f (A 1 ) ρon . Hence we have Combining both cases and optimizing over parameters d and c (d ← 3.08, c ← 260.24) we have:

C.2 Proof for Partition Matroid Submodular Secretaries
Let S 0 be the set of first N 0 elements, and let S j denote the elements in epoch j. Since the input permutation itself is random, the distribution over the sets S 0 , . . . , S k is identical to one resulting from the following process: each element e independently chooses a real number r e in (0, 1) and is placed in S 0 if r e ≤ 1 2 , and in S j if r e ∈ ( 1 2 + j−1 100k , 1 2 + j 100k ]. We shall use this observation to simplify our analysis. For the following lemma, we need to keep track of several events: 5. P i,j : Some element from partition group i has already been selected before epoch j. In the definitions above, we assume that a fixed tie breaking rule is used to ensure that there is a unique highest and second highest element. Lemma 4.7 For all partition groups i and epochs j, the algorithm selects the highest element from group i (according to the valuation function during epoch j) during epoch j with probability at least Ω( 1 k ). Specifically: where the probability is over the random permutation of the elements.
Proof. We observe that the event (H i,j ∧ S i,j ∧ ¬P i,j ) ∧ ( i ′ =i ¬F i ′ ,j ) implies that algorithm selects the highest element from group i in epoch j. We will lower bound the probability of this event. We will show this by considering the events P i,j , S i,j , L i,j , i ′ =i ¬F i ′ ,j , H i,j in this order, and lower bound the probability of each conditioning on the previous ones. Under any (arbitrary) valuation function, the events L i,j and S i,j depend on the real numbers chosen by the highest and the second highest elements. Thus Pr[L i,j ∧ S i,j ] ≥ 1 2 ( 1 2 − j−1 100k ) ≥ 1 5 . Let Q i,j denote the number of elements from group i that do not appear in S 0 , . . . , S j−1 , but are higher (under the valuation function at epoch j) than any group i element in S 0 , . . . , S j−1 . It is easy to see that the random variable Q i,j is dominated by a geometric random variable with parameter 1 2 . Moreover, for any element e contributing to Q i,j , it appears in epoch j with probability at most For convenience, let us define event E i,j = (L i,j ∧ S i,j ∧ ¬P i,j ). We have: We next upper bound the probability that groups i ′ = i have elements in epoch j that the algorithm might select, conditioned on E i,j . With Q i ′ ,j defined as above, we have Since there are at most k groups i ′ , by a union bound: To complete the proof, we observe Pr[H i,j |E i,j ∧ ( i ′ =i ¬F i ′ ,j )] ≥ 1/100k 2/5 = 1 40k and so: .

D Lower Bounds for the Constrained Submodular Maximization Problem in the Secretary Setting
In this section we show lower-bounds for the secretary problem over submodular functions. We first note that Kleinberg [Kle05] showed that for additive functions, the maximization problem in the on-line setting with a kuniform matroid constraint can be approximated within a factor of 1 − 5 √ k . We show that this is not the case for submodular functions, even in the information theoretic, semi-online setting (where the algorithm knows the value of OPT) by exhibiting a gap for arbitrarily large k.
Theorem D.1. No algorithm approximates submodular maximization in the semi-online setting with a k-uniform matroid constraint better than a factor of 8 9 for k = 2 or 17 18 for any even k.
No non-trivial bound is possible for k = 1 because the algorithm knows OPT. Thus the standard secretary lower bounds will not work.
Let R, S, be two finite sets such that S ⊆ R. We define the COVER(R, S) as follows: define the universe to be U = {ij : i ∈ R, j ∈ {B, T }}, define the set of elements W to contain i B = {iB} for i ∈ R and i T B for i = S. Define a submodular function f (C) = | S∈C S| where C ⊆ W .
We first prove the case for k = 2 with a small example and case analysis. We will chose a uniformly random r ∈ {1, 2} and in the semi-online setting will require the algorithm to pick at most k = 2 of the sets appearing in random order, while trying to maximize f . Letr = 3 − r, then the offline OPT is C * = {r T B ,r B } with f (C * ) = 3 Because OP T = 3, Claim D.2 implies no algorithm can do better than 8 9 fraction of OP T , which gives us the first part of the theorem.
Proof. We proceed by case analysis. In the case where the first element that arrives is r T B , the algorithm knows r and can obtain OP T = 3. This happens with probability 1 3 . In the case where the first element that arrives is 1 B , the algorithm can accept or reject the element. If the algorithm rejects, then it may as well take the next two elements that arrive. Since r = 1 with probability half, the expected payoff is at most 5 2 . Now suppose the algorithm accepts 1 B in the the first position. The algorithm should now pick r T B (and reject 2 B if it comes before r T B ) because the marginal value of r T B is at least as large as that of 2 B . Since r is random, this marginal value is 3 2 in expectation, and hence the expected payoff of the algorithm is once again 5 2 . Similarly, if the first element is 2 B , the payoff is bounded by 5 2 in expectation. Thus the total expected payoff of the algorithm is bounded by 1 3 · 3 + 1 3 · 5 2 + 1 3 · 5 2 = 8 3 .
Now we would like to show that something similar is true for much larger k. The basic idea is to combine many disjoint instances of COVER({1, 2}, {r}), and show that if the algorithm does well overall, it must have done well on each instance, violating Claim D.2. Claim D.3. For any even k, no algorithm has expected payoff more than 17 12 k in the semi-online setting on instances of COVER({1, . . . , k}, S) trying to maximize f and restricted to picking k sets, when S is drawn uniformly at random among subsets of {1, . . . , k} with k/2 elements.
Because OP T = 3k/2., Claim D.2 implies no algorithm can do better than a 17 18 fraction of OP T , which gives us the second part of the theorem.
Proof. For the sake of analysis, we think of the instance of COVER({1, . . . , k}, S) being created by first choosing a matching on the set {1, . . . , k} and then within each edge e = (i, j) of the matching choosing e r ∈ {i, j} to include in S.
We can then think of the sets of COVER({1, . . . , k}, S) being generated by taking the sets of the instance COVER({i, j}, {e r }) for each edge e = (i, j) in the matching. Call each of these k/2 instances of COVER({i, j}, {e r }) a puzzle.
Fix an semi-online algorithm A. Let C be the set of elements chosen by A. For 0 ≤ i ≤ 3, let P i be the set of puzzles such that C contains exactly i elements from the puzzle; let x i be the expected sizes of P i (over the randomness of the assignments of puzzles, the ordering, and A); and let E i be the expected payoff from all the puzzles in P i . Note that E 0 = 0 and E 3 = 3x 3 . Claim D.4. E 1 + E 2 ≤ 4k 3 − 2x 3 Proof. Given an instance of COVER({1, 2}, {r}), construct a random instance of COVER({1, . . . , k}, S) by generating a random matching and randomly picking a special edge e = (i, j), where i and j are randomly ordered. For each edge e ′ = (i ′ , j ′ ) pick e ′ r ∈ {i ′ , j ′ } to include in S. Now run A on this instance of COVER({1, . . . , k}, S), except than whenever an element of the puzzle corresponding to edge e comes along, replace it with the next element from the given instance of COVER({1, 2}, {r}); however replace 1 with i, and 2 with j. Run A on this instance of COVER({1, . . . , k}, S), and wheneven A chooses an element from the instance of COVER ({1, 2}, {r}), choose that element (it may be that A selects more than 2 elements, in which case, just select the first 2).
We combing the above claim with the fact that E 0 = 0 and E 3 = 3x 3 to get that Note also that A receives payoff at most 2 from any puzzle in P 1 , and at most 0 from any puzzle in P 0 . The maximum payoff from each puzzle is 3, which occurs in OPT. Thus Finally, there are k/2 puzzles, so x 0 + x 1 + x 2 + x 3 = k/2. Additionally, because the algorithm never picks more than k elements, we have x 1 + 2x 2 + 3x 3 ≤ k. Solving the first equation for x 2 and substituting for x 2 in the second we get Adding Equations 18, 19, and 20, we see that 2E[f (C)] ≤ 17 6 k − x 0 which implies that E[f (C)] ≤ 17 12 k.