Sum-of-Squares seminar: lecture notes and open problems
I just gave the final lecture in my seminar on Sum of Squares Upper Bounds, Lower Bounds, and Open Questions. (see also this previous post). The lectures notes are available on the web page and also as a single pdf file. They are extremely rough but I hope they would still be useful, as some of this material is not covered (to my knowledge) elsewhere. (Indeed two of the papers we covered are “hot off the press”: the work of Meka-Potechin-WIgderson on the planted clique problem hasn’t yet been posted online, and the work of Lee-Raghavendra-Steurer on semidefinite extension complexity was just posted online two weeks ago.)
Some of the topics we covered included the SDP based algorithms for problems such as Max-Cut, Sparsest-Cut, and Small-Set Expansion, lower bounds for Sum-of-Squares: 3XOR/3SAT and planted clique, using SOS for unsupervised learning, how might (and of course also might not) the SOS algorithm be used refute the Unique Games Conjecture, linear programming and semidefinite programming extension complexity.
Thanks to everyone that participated in the course and in particular to the students who scribed the notes (both in this course and my previous summer course) and to Jon Kelner and Ankur Moitra that gave guest lectures!
I thought I might post here the list of open problems I ended with- feel free to comment on those or add some of your own below.
In most cases I phrased the problem as asking to show a particular statement, though of course showing the opposite statement would be very interesting as well. This is not meant to be a complete or definitive list, but could perhaps spark your imagination to think of those or other research problems of your own. The broader themes these questions are meant to explore are:
- Can we understand in what cases do SOS programs of intermediate degree (larger than but much smaller than ) yield non-trivial guarantees?
It seems that for some problems (such as 3SAT) the degree/quality curve has a threshold behavior, where we need to get to degree roughly to beat the performance of the degree algorithm, while for other problems (such as UNIQUE GAMES) the degree/quality curve seems much smoother, though we don’t really understand it.
Understanding this computation/quality tradeoff in other settings, such as average case complexity, would be very interesting as well for areas such as learning, statistical physics, and cryptography.
- Can we give more evidence to, or perhaps refute, the intuition that the SOS algorithm is optimal in some broad domains?
- Can we understand the performance of SOS in average-case setting, and whether there are justifications to consider it optimal in this setting as well? This is of course interesting for both machine learning and cryptography.
- Can we understand the role of noise in the performance of the SOS algorithm? Is noise a way to distinguish between “combinatorial” and “algebraic” problems in the sense of my previous post?
Well posed problems
Show that for every constant there is some and a quasipolynomial () time algorithm that on input a subspace , can distinguish between the case that contains the characteristic vector of a set of measure at most , and the case that for every . Extend this to a quasipolynomial time algorithm to solve the small-set expansion problem (and hence refute the small set expansion hypothesis). Extend this to a quasipolynomial time algorithm to solve the unique-games problem (and hence refute the unique games conjecture). If you think this cannot be done then even showing that the (in fact, even ) SOS program does not solve the unique-games problem (or the norms ratio problem as defined above) would be very interesting.
Show that there is some constant such that the degree- SOS problem can distinguish between a random graph and a graph in which a clique of size was planted for some , or prove that this cannot be done. Even settling this question for would be very interesting.
Show that the SOS algorithm is optimal in some sense for “pseudo-random” constraint satisfaction problems, by showing that for every predicate , and pairwise independent distribution over , it is NP hard to distinguish, given an instance of MAX- (i.e., a set of constraints each of which corresponds to applying to literals of some Boolean variables ), between the case that one can satisfy fraction of the constraints, and the case that one can satisfy at most fraction of them. (In a recent, yet unpublished, work with Chan and Kothari, we show that small degree SOS programs cannot distinguish between these two cases.)
More generally, can we obtain a “UGC free Raghavendra Theorem”? For example, can we show (without relying on the UGC) that for every predicate , and , if there is an -variable instance of MAX- whose value is at most but on which the degree SOS program outputs at least , then distinguishing between the case that a CSP- instance as value at least and the case that it has value at most is NP-hard?
Show that there is some and such that for sufficiently small , the degree SOS program for Max-Cut can distinguish, given a graph , between the case that has a cut of value and the case that has a cut of value . (Note that Kelner and Parrilo have a conjectured approach to achieve this.) Can you do this with arbitrarily small ?
If you think the above cannot be done, even showing that the degree (or even better, ) SOS program cannot achieve this, even for the more general Max-2-LIN problem, would be quite interesting. As an intermediate step, settle Khot-Moshkovitz’s question whether for an arbitrarily large constant , the Max-2-LIN instance they construct (where the degree (for some constant ) SOS value is ) has actual value at most . Some intermediate steps that could be significantly easier are: the Khot-Moshkovitz construction is a reduction from a -CSP on variables that first considers all -sized subsets of the original variables and then applies a certain encoding to each one of those “cloud”. Prove that if this is modified to a single -sized cloud then the reduction would be “sound” in the sense that there would be no integral solution of value larger than . (This should be significantly easier to prove than the soundness of the Khot-Moshkovitz construction since it completely does away with their consistency test; still to my knowledge it is not proven in their paper. The reduction will not be “complete” in this case, since it will have more than exponential blowup and will not preserve SOS solutions but I still view this as an interesting step. Also if this step is completed, perhaps one can think of other ways than the “cloud” approach of KM to reduce the blowup of this reduction to for some small , maybe a “biased” version of their code could work as well.)
The following statement, if true, demonstrates one of the challenges in proving the soundness of KM construction: Recall that the KM boundary test takes a function and checks if where and have standard Gaussian coordinates that are each correlated for some . Their intended solution for will fail the test with probability . Prove that there is a function that passes the test with for some but such that for every constant and function of the form where a polynomial of degree at most , .
Show that there are some constant and , such that the degree -SOS program yields an approximation to the Sparsest Cut problem. If you think this can’t be done, even showing that the algorithm doesn’t beat would be very interesting.
Give a polynomial-time algorithm that for some sufficiently small , can (approximately) recover a planted -sparse vector inside a random subspace of dimension . That is, we choose as random Gaussian vectors, and the algorithm gets an arbitrary basis for the span of . Can you extend this to larger dimensions? Can you give a quasipolynomial time algorithm that works when has dimension ? Can you give a quasipolynomial time algorithm for certifying the Restricted Isometry Property (RIP) of a random matrix?
Improve the dictionary learning algorithm of [Barak-Kelner-Steurer] (in the setting of constant sparsity) from quasipolynomial to polynomial time.
(Suggested by Prasad Raghavendra.) Can SDP relaxations simulate local search?
While sum of squares SDP relaxations yield the best known approximations for CSPs, the same is not known for bounded degree CSPs. For instance, MAXCUT on bounded degree graphs can be approximated better than the Goemans-Willamson constant via a combination of SDP rounding and local search. Here local search refers to improving the value of the solution by locally modifying the values. Show that for every constant , there is some such that rounds of SOS yield an approximation for MAXCUT on graphs of maximum degree . Another problem to consider is maximum matching in 3-uniform hypergraphs. This can be approximated to a 3/4 factor using only local search (no LP/SDP relaxations), and some natural relaxations have a 1/2 integrality gap for it. Show that for every , rounds of SOS give a approximation for this problem, or rule this out via an integrality gap.
[Update: Prasad notes that the first problem for Max-Cut actually is solved as stated, since the paper also shows that the SDP integrality gap is better than for Max-Cut on bounded degree graphs. The second question is still open to my knowledge, and more generally understanding if SOS can always simulate local search.]
(Suggested by Ryan O’Donnell) Let be the vertex graph on where we connect every two vertices such that their distance (mod ) is at most for some constant . The set of vertices with least expansion is an arc. Can we prove this with an SOS proof of constant (independent of ) degree? For every there is a such that if we let be the graph with vertices corresponding to where we connect vertices if their Hamming distance is at most , then for every subsets of satisfying , there is an edge between and . Can we prove this with an SOS proof of constant degree?
[Update: William Perry showed a degree 4 proof (using the triangle inequality) for the fact that the least expanding sets a power of the cycle. Sangxia Huang proved independently a similar result.]
The following problems are not as well-defined, but this does not mean they are less important.
Find more problems in the area of unsupervised learning where one can obtain an efficient algorithm by giving a proof of identifiability using low degree SOS.
The notion of pseudo-distributions gives rise to a computational analog of Bayesian reasoning about the knowledge of a computationally-bounded observer. Can we give any interesting applications of this? Perhaps in economics? Or cryptography?
SOS, Cryptography, and . It sometimes seems as if in the context of combinatorial optimization it holds that “”, or in other words that all proof systems are automatizable. Can the SOS algorithm give any justification to this intuition? In contrast note that we do not believe that this assertion is actually true in general. Indeed, many of our candidates for public key encryption (though not all— see discussion in [Applebaum,Barak, Wigderson]) fall inside (or ). Can SOS shed any light on this phenonmenon? A major issue in cryptography is (to quote Adi Shamir) the lack of diversity in the “gene pool” of problems that can be used as a basis for public key encryption. If quantum computers are built, then essentially the only well-tested candidates are based on a single problem— Regev’s “Learning With Errors” (LWE) assumption (closely related to various problems on integer lattices). Some concrete questions along these lines are:
Find some evidence to the conjecture of Barak-Kindler-Steurer (or other similar conjectures) that the SOS algorithm might be optimal even in an average case setting. Can you find applications for this conjecture in cryptography?
Can we use a conjectured optimality of SOS to give public key encryption schemes? Perhaps to justify the security of LWE? One barrier for the latter could be that breaking LWE and related lattice problems is in fact in or .
Understand the role of noise in the performance of the SOS algorithm. The algorithm seems to be inherently noise robust, and it also seems that this is related to both its power and its weakness– as is demonstrated by cases such as solving linear equations where it cannot get close to the performance of the Gaussian elimination algorithm, but the latter is also extremely sensitive to noise.
Can we get any formal justifications to this intuition? What is the right way to define noise robustness in general? If we believe that the SOS algorithm is optimal (even in some average case setting) for noisy problems, can we get any quantitative predictions to the amount of noise needed for this to hold? This may be related to the question above of getting public key cryptography from assuming the optimality of SOS in the average case (see Barak-Kindler-Steurer and Applebaum-Barak-Wigderson).
Related to this: is there a sense in which SOS is an optimal noise-robust algorithm or proof system? Are there natural stronger proof systems that are still automatizable (maybe corresponding to other convex programs such as hyperbolic programming, or maybe using a completely different paradigm)? Are there natural noise-robust algorithms for combinatorial optimizations that are not captured by the
SOS framework? Are there natural stronger proof systems than SOS (even non
automatizable ones) that are noise-robust and are stronger than SOS for natural combinatorial optimization problems?
Can we understand better the role of the feasible interpolation property in this context?
I have suggested that the main reason that a “robust” proof does not translate into an SOS proof is by use of the probabilistic method, but this is by no means a universal law and getting better intuition as to what types of arguments do and don’t translate into low degree SOS proofs is an important research direction. Ryan O’Donnell’s problems above present one challenge to this viewpoint. Another approach is to try to use techniques from derandomization such as use of additive combinatorics or the Zig-Zag product to obtain “hard to SOS” proofs. In particular, is there an SOS proof that the graph constructed by Capalbo, Reingold, Vadhan and Wigderson (STOC 2002) is a “lossless expander” (expansion larger than )? Are there SOS proofs for the pseudorandom properties of the condensers we construct in the work with Impagliazzo and Wigderson (FOCS 2004, SICOMP 2006) or other constructions using additive combinatorics? I would suspect the answer might be “no”. (Indeed, this may be related to the planted clique question, as these tools were used to construct the best known Ramsey graphs.)