Maximizing Submodular Functions (Part 2)

Continuing on my last post, today I will talk about recent work by Niv Buchbinder, Moran Feldman, Seffi Naor, and Roy Schwartz that gives a simple 1/2 approximation to the (unconstrained) submodular maximization problem, matching the hardness. Do see the paper (which should be available in a couple of weeks) for full details. Apologies in advance for any errors.

First consider the greedy algorithm that starts with the empty set, and keeps adding the most profitable item, as long as there is an improvement in the objective. For most variants of set cover type problems, such algorithms give the best possible approximation guarantees.

Unfortunately, it is not hard to construct set functions where the approximation ratio of greedy is unbounded. How about reverse greedy: starting with everything and throwing out elements while we improve? The function \tilde{f} defined as \tilde{f}(A) = f(U-A) is also submodular when f is, and running the “reverse greedy” on \tilde{f} is same as running greedy on f. Thus a bad example for greedy can be transformed into one for reverse greedy.

The first idea that BFNS try is to do both greedy and reverse greedy, in a correlated way. Let’s start with X = \phi and Y =U. In the ith step, we will make X and Y agree in the ith coordinate, by either adding i to X or deleting i from Y. A greedy approach to doing this is to add i to X if the marginal benefit a_i = f(X+i) - f(X) exceeds b_i = f(Y - i) - f(Y). It can also by shown using the definition of submodularity that the sum a_i + b_i is non-negative, so at least one of the sets improves in value. In |U| steps, we have X=Y and that is the output of the algorithm.

Is this “coupled greedy algorithm” any good? BFNS first show that this gives a 1/3 approximation. Towards a short proof, here is a little more notation. Let O be the optimal set. Let U_{-[i]} denote the tail set \{i+1,\ldots,n\} and let O_{-[i]} = O \cap U_{-[i]} denote the part of O in the tail. Let X_i and Y_i denote the sets X and Y at the end of iteration i. Thus X_0 = \phi, Y_0=U and Y_i = X_i \cup U_{-[i]}. For simplicity of presentation, we will assume that f(\phi) = f(U) = 0.

Now here is the crucial definition. Consider the hybrid set Z_i = X_i \cup O_{-[i]}. Thus we have Z_0 = O and Z_n = X is the output of the algorithm. We don’t know Z_0 but we know Z_n. In the ith step, we learn a little more about our set Z_i, and let l_i denote the quantity f(Z_{i-1}) - f(Z_{i}). The total loss \sum_i l_i is f(O) - f(X). We will show that the total loss is at most 2f(X). This would imply that claimed approximation ratio.

Suppose that a_i \geq b_i in step i, so that we add i. We claim that in this case, l_i \leq a_i = f(X_i)-f(X_{i-1}). Two cases: if i is in O, then l_i is zero since Z_i = Z_{i-1}. On the other hand a_i = max(a_i,b_i) \geq (a_i+b_i)/2 \geq 0. If i is not in O, then Z_{i} = Z_{i-1} \cup \{i\}. Thus (-l_i) is the marginal value of i with respect to Z_{i-1}. Since Z_{i-1} \subseteq Y_{i}, this marginal value is at least b_i, which in turn means that l_i \leq -b_i \leq a_i. Summed over all iterations where a_i \geq b_i, the total loss is then at most the sum of a_i‘s over these iterations. This is exactly f(X).

An essentially identical argument shows that the sum over iterations when a_i \leq b_i = f(Y_i)-f(Y_{i-1}) is also at most f(X). This finishes the proof.

But 1/3 is not what I promised above. How do we improve on this? The answer to this question was in the last post: we will use the multilinear relaxation, which extends the defintion of a submodular function to the whole of [0,1]^n. BFNS modify the above algorithm to do the coupled walks in [0,1]^n instead of on \{0,1\}^n. Start with x = 0^n and y = 1^n. In the ith step, we will make x_i and y_i equal, so that x and y agree in n steps. Defining a_i as F(x +1_i) - F(x) and b_i as F(y-1_i) - F(y), we make x_i an y_i agree as follows. If only one of a_i and b_i is positive, the choice is obvious: we set x_i=y_i = 1 in the first case, and make them both zero in the second case. In case they are both positive, it is natural to interpolate linearly: we set x_i=y_i = a_i/(a_i+b_i). Instead of the sets X,Y,Z, above, we will have their fractional counterparts: vectors x,y,z. Here z is defined as being equal to the optimum o on the tail, and equal to x on the first $i$ coordinates.

The proof is now very similar to the previous case, except that we save a factor of two. The following inequality implies that the total loss is only F(x), which will lead to a factor \frac{1}{2}:

l_i \leq \frac{1}{2}[F(x_{i}) - F(x_{i-1})] + \frac{1}{2}[F(y_i) - F(y_{i-1})].

Proving this is not that hard once we know that this is what is to be proved. In the case that one of a_i or b_i is negative, this is fairly straightforward given the analysis in the \frac{1}{3} approximation above. In the case that they are both non-negative, we need to compute the various terms. It is easy to check that in this case [F(x_{i}) - F(x_{i-1})] is equal to \frac{a_i^2}{a_i+b_i}, and that [F(y_{i}) - F(y_{i-1})] is equal to \frac{b_i^2}{a_i+b_i}. The quantity l_i can be upper bounded, again using arguments similar to those above, by \frac{a_i b_i}{a_i+b_i}. Since 2a_ib_i \leq a_i^2 + b_i^2, we are done.

6 thoughts on “Maximizing Submodular Functions (Part 2)

  1. Thanks for the post. For the 1/2 approximation, when calculating $a_i, b_i$, we need to calculate $F(x)$ for some fractional $x$. But when $x$ is fractional, do we need exponential number of calls to the oracle?

    1. Good catch. I tried to gloss over this, but this is indeed an issue that has to be handled. Exactly computing F would indeed require an exponential number of calls. So one uses an estimate of F based on a small number of samples and argues that this suffices. This can be done and all works that use the multilinear relaxation have to deal with this. Here is one approach: using Chernoff bounds, you can show that if you take polynomially many random samples and average to estimate F, the additive error is at most M/poly(n) except with negilgible probability, where M is an upper bound on f. Intuitively, if an a_i or a b_i above is large, our estimates will be fairly accurate; if they are really small, we make a mistake, but then making a mistake shouldn’t effect us so much. One should be able to modify the analysis slightly to make this formal and argue that if we run the algorithm with these estimates with small additive errors, the final result is still a (1/2 – 1/poly) approximation.

      1. Thanks for your reply. Also, for the 1/3-approximation proof, when $a_i \ge b_i$ and $i \not \in O$, I feel $-l_i$ should be at least $-b_i$ instead of $b_i$ since $-b_i$ is the marginal value of $i$ respect to $Y_i – i$. Also in this case, we can still say $l_i \le b_i \le a_i$.

Comments are closed.