In this post and the next, I will talk about the problem of maximizing a submodular function. Submodularity is a natural property of set functions, that captures the diminishing returns property. Formally, let $f$ be a set function $f : 2^{U} \rightarrow \Re$, and let us assume that $U=[n]$.  Then $f$ is submodular if for any sets $A \subseteq B \subseteq U$, and any $i \in U$, we have the inequality

$f(B\cup \{i\}) - f(B) \leq f(A\cup \{i\}) - f(A)$.

In other words, the marginal value of an element $i$  to a set $A$ is a non-increasing function of the set $A$. Let’s start with some examples of submodular functions:

1. $f(A) = g(|A|)$ for any concave function $g : \Re \rightarrow \Re$.
2. Coverage function: we have a set $S_i \subseteq V$ associated with each $i \in U$ (so $U$ corresponds to sets in a set cover instance). $f(A)$ is the number of elements covered by sets in $A$, i.e. $f(A) = |\cup_{i \in A} S_i|$. More generally, if we have a non-negative weight $w_u$ associated with each $u \in U$, then we can set $f(A)$ to be the total weight of the elements covered by sets in $A$.
3. Cut function: $U$ is the vertex set of a graph. $f(A)$ is the number of edges in the cut $(A, U\setminus A)$.
4. Matroid rank function: we have a matroid on the set $U$, and $f(A)$ is the size of the largest independent set contained in $A$.

Given a non-negative submodular function, it is natural to ask if we can find the set $A$ that maximizes $f(A)$. How is $f$ given to us? We assume we have oracle access to the value of the function: given a set $A$, we can compute $f(A)$ efficiently. Thus for example, we may have a succinct structure (such as the graph in example 3 above) that allows us to compute $f$ quickly.  As example 3 captures the MAXCUT problem, this problem at hand is APX hard.

Note that in this abstract form, we have really very little to work with. The algorithm can compute f on a polynomial number of sets, possibly chosen adaptively. The analysis can use non-negativity of f, and decreasing marginals inequality above. On the positive side, this means both the algorithms and the analyses are usually simple and short.

Uri Feige, Vahab Mirrokni and Jan Vondrák studied this general problem in 2007 and showed that one cannot hope to get better than a 1/2 approximation: a better-than-half approximation would require exponentially many queries to the oracle (in fact, Dobzinski and Vondrák  recently showed that this is not just an informational bottelneck: even for succinctly represented functions, it is NP hard to get a better-than-half approximation). On the positive side, Feige, Mirrokni and Vondrák  (FMV) showed that a random set gives a 1/4 approximation, and gave a beautiful 0.4 approximation algorithm based on smoothed local search. Subsequent works improved the approximation ratio to 0.41 and 0.42.

Today I will sketch the $\frac{1}{4}$ approximation. FMV first prove a simple inequality. Let $A(p)$ denote a random set where each element in $A$ is chosen with probability $p$. Then they show that for any submodular function $f$, and for any $A$ and $p$

$E[f(A(p))] \geq (1-p)f(\phi) + pf(A)$.

Intuitively, this is a concavity type statement. The proof is in fact a very simple induction on $A$: its true for $A$ being empty, and it is an easy exercise to do the induction step using submodularity. Using this inequality, we get that for any $A,B$, and any $p,q$

$E[f(A(p) \cup B(q))] \geq E[(1-q)f(A(p)) + q f(A(p) \cup B)]$.

Here we have used the fact that conditioned on any value of $A(p)=R$, the function $g(T) = f(R \cup T)$ is submodular so we can apply the above inequality to average over $B(q)$.  Similarly  if we now use the fact that $g(S) = f(S\cup B)$ is submodular and apply the inequality again, we get that the above right hand side is lower bounded by

$(1-q)(1-p)f(\phi) + (1-q)p f(A) + q(1-p) f(B) + qp f(A\cup B)$.

Armed with this inequality, the expected value of a random set can be easily lower bounded. Let $S$ denote the optimal set and let $\bar{S}$ denote its complement. Then a random set is $S(\frac{1}{2}) \cup \bar{S}(\frac{1}{2})$. Thus

$E[f(S(\frac{1}{2}) \cup \bar{S}(\frac{1}{2}))] \geq \frac{1}{4}\left(f(\phi) + f(S) + f(\bar{S}) + f(X)\right)$.

Since $f(\phi), f(\bar{S})$ and $f(X)$ are all non-negative, we conclude that a random set has at least  $\frac{1}{4}$ as much value as the optimal set. In the case that $f$ is symmetric (i.e. $f(A) = f(\bar{A})$ for all $A$), this gives a half approximation, which is the best possible.

With an eye towards the next post, let me define the multilinear relaxation of a submodular function.  A subset $A \subseteq U$ naturally corresponds to a 0-1 vector in $n= |U|$ dimensions, so a submodular function can be viewed as a map from the hypercube $\{0,1\}^n$ to the reals. The mulitlinear relaxation, defined by Calinescu, Chekuri, Pál and Vondrák, is an extension of this map to the whole cube $[0,1]^{n}$.  For a vector $x \in [0,1]^n$, the multilinear relxation $F(x)$ is defined to be the expectation of $f(A)$, where $A$  is a random set defined by picking $i$ with probability $x_i$, independently for each $i$. This has several useful properties and turns out to be “right” extension of $f$ for submodular maximization problems. The concavity type statement above is in fact a special case of a more general concavity property of $F$ (see this paper for a proof). In the next post, we will see a recent improvement to the above results, and this relaxation will come in handy.

1. March 22, 2012 5:11 pm

Thanks for the post, Kunal/Omer. One of the curious facts about submodularity to me is that while it naturally exhibits concavity, it also inherits a property from convexity: minimization is possible in polytime. Is there some basic or higher-level fact that leads to this nice confluence?

March 22, 2012 9:34 pm

I think one explanation of that confluence is the Lovasz extension. This is a different and older extension which can be defined as the expected value of a different rounding scheme that picks a random lambda uniformly in [0,1], and rounds to one the coordinates with value at least lambda. This extension turns out to be convex, and can be optimized using Ellipsoid.

March 22, 2012 8:36 pm

I like CCPV’s probabilistic view of the multilinear relaxation. I think it is also instructive to view set functions as real-valued multilinear polynomials using the Möbius transform. Here one uses the natural indicator string in {0,1}^n for a set S \subseteq [n], interpolates the 2^n values of the function, and expands. The resulting multilinear polynomial will agree with the function on inputs in {0,1}^n, but one can also evaluate the polynomial on any point in R^n. This is of course completely equivalent to CCPV’s definition, but the Möbius view avoids any talk of probability.

One can do the same thing using the (somewhat less natural) indicator string in {-1,+1}^n for a set S\subseteq [n], and this real-valued multilinear polynomial is of course the Fourier transform. We can also evaluate this polynomial on points outside of {-1,+1}^n, and in particular when we evaluate this polynomial on points in {-p,p}^n, this is the action of the noise operator on a set function. Viewed in this light, the FMV inequality becomes a statement about the noise stability of submodular functions. (Plug! http://eccc.hpi-web.de/report/2011/090/)

Instead of applying the same noise rate to every coordinate, one can apply different noise rates to different coordinates, and one recovers the 2-part inequality of FMV and both the 1/4 and the 1/3 approximation. In fact one can recover the 3-part inequality as well, and the proof of the 0.4-approximation becomes greatly simplified (at least notationally).

March 24, 2012 2:19 am

Thanks Homin for the comment. Your paper has been on my list, and I hope to read and understand it very soon.

March 23, 2012 4:44 pm

Kunal, you didn’t mention it, but I assume that the multilinear extension is also submodular (say with respect to the max/min lattice on the reals?)

Suresh: I assume you mean that F(x+delta 1_i) – F(x) is non increasing in x. The last paper linked above, by Calinescu et al. shows that this is true in [0,1]^n, as the mixed second order partial derivatives $\partial^2 F/ \partial x_i \partial x_j$ are negative.