In this post and the next, I will talk about the problem of maximizing a submodular function. Submodularity is a natural property of set functions, that captures the diminishing returns property. Formally, let be a set function
, and let us assume that
. Then
is submodular if for any sets
, and any
, we have the inequality
.
In other words, the marginal value of an element to a set
is a non-increasing function of the set
. Let’s start with some examples of submodular functions:
for any concave function
.
- Coverage function: we have a set
associated with each
(so
corresponds to sets in a set cover instance).
is the number of elements covered by sets in
, i.e.
. More generally, if we have a non-negative weight
associated with each
, then we can set
to be the total weight of the elements covered by sets in
.
- Cut function:
is the vertex set of a graph.
is the number of edges in the cut
.
- Matroid rank function: we have a matroid on the set
, and
is the size of the largest independent set contained in
.
Given a non-negative submodular function, it is natural to ask if we can find the set that maximizes
. How is
given to us? We assume we have oracle access to the value of the function: given a set
, we can compute
efficiently. Thus for example, we may have a succinct structure (such as the graph in example 3 above) that allows us to compute
quickly. As example 3 captures the MAXCUT problem, this problem at hand is APX hard.
Note that in this abstract form, we have really very little to work with. The algorithm can compute f on a polynomial number of sets, possibly chosen adaptively. The analysis can use non-negativity of f, and decreasing marginals inequality above. On the positive side, this means both the algorithms and the analyses are usually simple and short.
Uri Feige, Vahab Mirrokni and Jan Vondrák studied this general problem in 2007 and showed that one cannot hope to get better than a 1/2 approximation: a better-than-half approximation would require exponentially many queries to the oracle (in fact, Dobzinski and Vondrák recently showed that this is not just an informational bottelneck: even for succinctly represented functions, it is NP hard to get a better-than-half approximation). On the positive side, Feige, Mirrokni and Vondrák (FMV) showed that a random set gives a 1/4 approximation, and gave a beautiful 0.4 approximation algorithm based on smoothed local search. Subsequent works improved the approximation ratio to 0.41 and 0.42.
Today I will sketch the approximation. FMV first prove a simple inequality. Let
denote a random set where each element in
is chosen with probability
. Then they show that for any submodular function
, and for any
and
.
Intuitively, this is a concavity type statement. The proof is in fact a very simple induction on : its true for
being empty, and it is an easy exercise to do the induction step using submodularity. Using this inequality, we get that for any
, and any
.
Here we have used the fact that conditioned on any value of , the function
is submodular so we can apply the above inequality to average over
. Similarly if we now use the fact that
is submodular and apply the inequality again, we get that the above right hand side is lower bounded by
.
Armed with this inequality, the expected value of a random set can be easily lower bounded. Let denote the optimal set and let
denote its complement. Then a random set is
. Thus
.
Since and
are all non-negative, we conclude that a random set has at least
as much value as the optimal set. In the case that
is symmetric (i.e.
for all
), this gives a half approximation, which is the best possible.
With an eye towards the next post, let me define the multilinear relaxation of a submodular function. A subset naturally corresponds to a 0-1 vector in
dimensions, so a submodular function can be viewed as a map from the hypercube
to the reals. The mulitlinear relaxation, defined by Calinescu, Chekuri, Pál and Vondrák, is an extension of this map to the whole cube
. For a vector
, the multilinear relxation
is defined to be the expectation of
, where
is a random set defined by picking
with probability
, independently for each
. This has several useful properties and turns out to be “right” extension of
for submodular maximization problems. The concavity type statement above is in fact a special case of a more general concavity property of
(see this paper for a proof). In the next post, we will see a recent improvement to the above results, and this relaxation will come in handy.
Thanks for the post, Kunal/Omer. One of the curious facts about submodularity to me is that while it naturally exhibits concavity, it also inherits a property from convexity: minimization is possible in polytime. Is there some basic or higher-level fact that leads to this nice confluence?
I think one explanation of that confluence is the Lovasz extension. This is a different and older extension which can be defined as the expected value of a different rounding scheme that picks a random lambda uniformly in [0,1], and rounds to one the coordinates with value at least lambda. This extension turns out to be convex, and can be optimized using Ellipsoid.
I like CCPV’s probabilistic view of the multilinear relaxation. I think it is also instructive to view set functions as real-valued multilinear polynomials using the Möbius transform. Here one uses the natural indicator string in {0,1}^n for a set S \subseteq [n], interpolates the 2^n values of the function, and expands. The resulting multilinear polynomial will agree with the function on inputs in {0,1}^n, but one can also evaluate the polynomial on any point in R^n. This is of course completely equivalent to CCPV’s definition, but the Möbius view avoids any talk of probability.
One can do the same thing using the (somewhat less natural) indicator string in {-1,+1}^n for a set S\subseteq [n], and this real-valued multilinear polynomial is of course the Fourier transform. We can also evaluate this polynomial on points outside of {-1,+1}^n, and in particular when we evaluate this polynomial on points in {-p,p}^n, this is the action of the noise operator on a set function. Viewed in this light, the FMV inequality becomes a statement about the noise stability of submodular functions. (Plug! http://eccc.hpi-web.de/report/2011/090/)
Instead of applying the same noise rate to every coordinate, one can apply different noise rates to different coordinates, and one recovers the 2-part inequality of FMV and both the 1/4 and the 1/3 approximation. In fact one can recover the 3-part inequality as well, and the proof of the 0.4-approximation becomes greatly simplified (at least notationally).
Thanks Homin for the comment. Your paper has been on my list, and I hope to read and understand it very soon.
Kunal, you didn’t mention it, but I assume that the multilinear extension is also submodular (say with respect to the max/min lattice on the reals?)
Suresh: I assume you mean that F(x+delta 1_i) – F(x) is non increasing in x. The last paper linked above, by Calinescu et al. shows that this is true in [0,1]^n, as the mixed second order partial derivatives
are negative.
Yes, that’s what I meant. Thanks !