Continuing on my last post, today I will talk about recent work by Niv Buchbinder, Moran Feldman, Seffi Naor, and Roy Schwartz that gives a simple 1/2 approximation to the (unconstrained) submodular maximization problem, matching the hardness. Do see the paper (which should be available in a couple of weeks) for full details. Apologies in advance for any errors.

First consider the greedy algorithm that starts with the empty set, and keeps adding the most profitable item, as long as there is an improvement in the objective. For most variants of set cover type problems, such algorithms give the best possible approximation guarantees.

Unfortunately, it is not hard to construct set functions where the approximation ratio of greedy is unbounded. How about reverse greedy: starting with everything and throwing out elements while we improve? The function $\tilde{f}$ defined as $\tilde{f}(A) = f(U-A)$ is also submodular when $f$ is, and running the “reverse greedy” on $\tilde{f}$ is same as running greedy on $f$. Thus a bad example for greedy can be transformed into one for reverse greedy.

The first idea that BFNS try is to do both greedy and reverse greedy, in a correlated way. Let’s start with $X = \phi$ and $Y =U$. In the ith step, we will make $X$ and $Y$ agree in the ith coordinate, by either adding $i$ to $X$ or deleting $i$ from $Y$. A greedy approach to doing this is to add $i$ to $X$ if the marginal benefit $a_i = f(X+i) - f(X)$ exceeds $b_i = f(Y - i) - f(Y)$. It can also by shown using the definition of submodularity that the sum $a_i + b_i$ is non-negative, so at least one of the sets improves in value. In $|U|$ steps, we have $X=Y$ and that is the output of the algorithm.

Is this “coupled greedy algorithm” any good? BFNS first show that this gives a 1/3 approximation. Towards a short proof, here is a little more notation. Let $O$ be the optimal set. Let $U_{-[i]}$ denote the tail set $\{i+1,\ldots,n\}$ and let $O_{-[i]} = O \cap U_{-[i]}$ denote the part of $O$ in the tail. Let $X_i$ and $Y_i$ denote the sets $X$ and $Y$ at the end of iteration $i$. Thus $X_0 = \phi, Y_0=U$ and $Y_i = X_i \cup U_{-[i]}$. For simplicity of presentation, we will assume that $f(\phi) = f(U) = 0$.

Now here is the crucial definition. Consider the hybrid set $Z_i = X_i \cup O_{-[i]}$. Thus we have $Z_0 = O$ and $Z_n = X$ is the output of the algorithm. We don’t know $Z_0$ but we know $Z_n$. In the ith step, we learn a little more about our set $Z_i$, and let $l_i$ denote the quantity $f(Z_{i-1}) - f(Z_{i})$. The total loss $\sum_i l_i$ is $f(O) - f(X)$. We will show that the total loss is at most $2f(X)$. This would imply that claimed approximation ratio.

Suppose that $a_i \geq b_i$ in step $i$, so that we add $i$. We claim that in this case, $l_i \leq a_i = f(X_i)-f(X_{i-1})$. Two cases: if $i$ is in $O$, then $l_i$ is zero since $Z_i = Z_{i-1}$. On the other hand $a_i = max(a_i,b_i) \geq (a_i+b_i)/2 \geq 0$. If $i$ is not in $O$, then $Z_{i} = Z_{i-1} \cup \{i\}$. Thus $(-l_i)$ is the marginal value of $i$ with respect to $Z_{i-1}$. Since $Z_{i-1} \subseteq Y_{i}$, this marginal value is at least $b_i$, which in turn means that $l_i \leq -b_i \leq a_i$. Summed over all iterations where $a_i \geq b_i$, the total loss is then at most the sum of $a_i$‘s over these iterations. This is exactly $f(X)$.

An essentially identical argument shows that the sum over iterations when $a_i \leq b_i = f(Y_i)-f(Y_{i-1})$ is also at most $f(X)$. This finishes the proof.

But 1/3 is not what I promised above. How do we improve on this? The answer to this question was in the last post: we will use the multilinear relaxation, which extends the defintion of a submodular function to the whole of $[0,1]^n$. BFNS modify the above algorithm to do the coupled walks in $[0,1]^n$ instead of on $\{0,1\}^n$. Start with $x = 0^n$ and $y = 1^n$. In the ith step, we will make $x_i$ and $y_i$ equal, so that $x$ and $y$ agree in n steps. Defining $a_i$ as $F(x +1_i) - F(x)$ and $b_i$ as $F(y-1_i) - F(y)$, we make $x_i$ an $y_i$ agree as follows. If only one of $a_i$ and $b_i$ is positive, the choice is obvious: we set $x_i=y_i = 1$ in the first case, and make them both zero in the second case. In case they are both positive, it is natural to interpolate linearly: we set $x_i=y_i = a_i/(a_i+b_i)$. Instead of the sets $X,Y,Z$, above, we will have their fractional counterparts: vectors $x,y,z$. Here $z$ is defined as being equal to the optimum $o$ on the tail, and equal to $x$ on the first $i$ coordinates.

The proof is now very similar to the previous case, except that we save a factor of two. The following inequality implies that the total loss is only $F(x)$, which will lead to a factor $\frac{1}{2}$:

$l_i \leq \frac{1}{2}[F(x_{i}) - F(x_{i-1})] + \frac{1}{2}[F(y_i) - F(y_{i-1})].$

Proving this is not that hard once we know that this is what is to be proved. In the case that one of $a_i$ or $b_i$ is negative, this is fairly straightforward given the analysis in the $\frac{1}{3}$ approximation above. In the case that they are both non-negative, we need to compute the various terms. It is easy to check that in this case $[F(x_{i}) - F(x_{i-1})]$ is equal to $\frac{a_i^2}{a_i+b_i}$, and that $[F(y_{i}) - F(y_{i-1})]$ is equal to $\frac{b_i^2}{a_i+b_i}$. The quantity $l_i$ can be upper bounded, again using arguments similar to those above, by $\frac{a_i b_i}{a_i+b_i}$. Since $2a_ib_i \leq a_i^2 + b_i^2$, we are done.

6 Comments leave one →
1. Chaoxu Tong permalink
September 9, 2012 7:25 pm

Thanks for the post. For the 1/2 approximation, when calculating $a_i, b_i$, we need to calculate $F(x)$ for some fractional $x$. But when $x$ is fractional, do we need exponential number of calls to the oracle?

• Kunal Talwar permalink
September 9, 2012 9:23 pm

Good catch. I tried to gloss over this, but this is indeed an issue that has to be handled. Exactly computing F would indeed require an exponential number of calls. So one uses an estimate of F based on a small number of samples and argues that this suffices. This can be done and all works that use the multilinear relaxation have to deal with this. Here is one approach: using Chernoff bounds, you can show that if you take polynomially many random samples and average to estimate F, the additive error is at most M/poly(n) except with negilgible probability, where M is an upper bound on f. Intuitively, if an a_i or a b_i above is large, our estimates will be fairly accurate; if they are really small, we make a mistake, but then making a mistake shouldn’t effect us so much. One should be able to modify the analysis slightly to make this formal and argue that if we run the algorithm with these estimates with small additive errors, the final result is still a (1/2 – 1/poly) approximation.

• Chaoxu Tong permalink
September 9, 2012 9:36 pm

Thanks for your reply. Also, for the 1/3-approximation proof, when $a_i \ge b_i$ and $i \not \in O$, I feel $-l_i$ should be at least $-b_i$ instead of $b_i$ since $-b_i$ is the marginal value of $i$ respect to $Y_i – i$. Also in this case, we can still say $l_i \le b_i \le a_i$.