There’s been some discussion on the NIPS experiment, enough of it that even my neuro-scientist brother sent me a link to Eric Price’s blog post. The gist of it is that the program chairs duplicated the reviewing process for 10% of the papers, to see how many would get inconsistent decisions, and it turned out that 25.9% of them did (one of the program chairs predicted that it would be 20% and the other that it would be 25%, see also here, here and here). Eric argues that the right way to measure disagreement is to look at the fraction of papers that process A accepted that would have been rejected by process B, which comes out to more than 50%.

It’s hard for me to interpret this number. One interpretation is that it’s a failure of the refereeing process that people can’t agree on more than half of the list of accepted papers. Another viewpoint is that since the disagreement is not much larger than predicted beforehand, we shouldn’t be that surprised about it. It’s tempting when having such discussions to model papers as having some inherent quality score, where the goal of the program committee is to find all papers above a certain threshold. The truth is that different papers have different, incomparable qualities, that appeal to different subsets of people. The goal of the program committee is to curate an a diverse and intellectually stimulating program for the conference. This is an inherently subjective task, and it’s not surprising that different committees would arrive at different conclusions. I do not know what’s the “optimal” amount of variance in this process, but I would have been quite worried if it was zero, since it would be a clear sign of groupthink. Lastly, I think this experiment actually points out to an important benefit of the conference system. Unlike journals, where the editorial board tends to stay constant for a long period, in conferences one gets a fresh draw of the committee every 6 months or a year.

]]>

I truly believe that theoretical computer science, and computer science at large, will grow significantly in importance over the coming decades, as it occupies a much more central place in many of the sciences and other human activities. We’ll have many more problems to solve, and we can’t do it without using half of the world’s talent pool.

]]>

Some of the topics we covered included the SDP based algorithms for problems such as Max-Cut, Sparsest-Cut, and Small-Set Expansion, lower bounds for Sum-of-Squares: 3XOR/3SAT and planted clique, using SOS for unsupervised learning, how might (and of course also might not) the SOS algorithm be used refute the Unique Games Conjecture, linear programming and semidefinite programming extension complexity.

Thanks to everyone that participated in the course and in particular to the students who scribed the notes (both in this course and my previous summer course) and to Jon Kelner and Ankur Moitra that gave guest lectures!

I thought I might post here the list of open problems I ended with- feel free to comment on those or add some of your own below.

In most cases I phrased the problem as asking to show a particular statement, though of course showing the opposite statement would be very interesting as well. This is not meant to be a complete or definitive list, but could perhaps spark your imagination to think of those or other research problems of your own. The broader themes these questions are meant to explore are:

- Can we understand in what cases do SOS programs of intermediate degree (larger than but much smaller than ) yield non-trivial guarantees?

It seems that for some problems (such as 3SAT) the degree/quality curve has a threshold behavior, where we need to get to degree roughly to beat the performance of the degree algorithm, while for other problems (such as UNIQUE GAMES) the degree/quality curve seems much smoother, though we don’t really understand it.

Understanding this computation/quality tradeoff in other settings, such as average case complexity, would be very interesting as well for areas such as learning, statistical physics, and cryptography. - Can we give more evidence to, or perhaps refute, the intuition that the SOS algorithm is
*optimal*in some broad domains? - Can we understand the performance of SOS in
*average-case*setting, and whether there are justifications to consider it optimal in this setting as well? This is of course interesting for both machine learning and cryptography. - Can we understand the role of
*noise*in the performance of the SOS algorithm? Is noise a way to distinguish between “combinatorial” and “algebraic” problems in the sense of my previous post?

Show that for every constant there is some and a quasipolynomial () time algorithm that on input a subspace , can distinguish between the case that contains the characteristic vector of a set of measure at most , and the case that for every . Extend this to a quasipolynomial time algorithm to solve the small-set expansion problem (and hence refute the small set expansion hypothesis). Extend this to a quasipolynomial time algorithm to solve the unique-games problem (and hence refute the unique games conjecture). If you think this cannot be done then even showing that the (in fact, even ) SOS program does not solve the unique-games problem (or the norms ratio problem as defined above) would be very interesting.

Show that there is some constant such that the degree- SOS problem can distinguish between a random graph and a graph in which a clique of size was planted for some , or prove that this cannot be done. Even settling this question for would be very interesting.

Show that the SOS algorithm is optimal in some sense for “pseudo-random” constraint satisfaction problems, by showing that for every predicate , and pairwise independent distribution over , it is NP hard to distinguish, given an instance of MAX- (i.e., a set of constraints each of which corresponds to applying to literals of some Boolean variables ), between the case that one can satisfy fraction of the constraints, and the case that one can satisfy at most fraction of them. (In a recent, yet unpublished, work with Chan and Kothari, we show that small degree SOS programs cannot distinguish between these two cases.)

More generally, can we obtain a “UGC free Raghavendra Theorem”? For example, can we show (without relying on the UGC) that for every predicate , and , if there is an -variable instance of MAX- whose value is at most but on which the degree SOS program outputs at least , then distinguishing between the case that a CSP- instance as value at least and the case that it has value at most is NP-hard?

Show that there is some and such that for sufficiently small , the degree SOS program for Max-Cut can distinguish, given a graph , between the case that has a cut of value and the case that has a cut of value . (Note that Kelner and Parrilo have a conjectured approach to achieve this.) Can you do this with arbitrarily small ?

If you think the above cannot be done, even showing that the degree (or even better, ) SOS program cannot achieve this, even for the more general Max-2-LIN problem, would be quite interesting. As an intermediate step, settle Khot-Moshkovitz’s question whether for an arbitrarily large constant , the Max-2-LIN instance they construct (where the degree (for some constant ) SOS value is ) has actual value at most . Some intermediate steps that could be significantly easier are: the Khot-Moshkovitz construction is a reduction from a -CSP on variables that first considers all -sized subsets of the original variables and then applies a certain encoding to each one of those “cloud”. Prove that if this is modified to a single -sized cloud then the reduction would be “sound” in the sense that there would be no integral solution of value larger than . (This should be significantly easier to prove than the soundness of the Khot-Moshkovitz construction since it completely does away with their consistency test; still to my knowledge it is not proven in their paper. The reduction will not be “complete” in this case, since it will have more than exponential blowup and will not preserve SOS solutions but I still view this as an interesting step. Also if this step is completed, perhaps one can think of other ways than the “cloud” approach of KM to reduce the blowup of this reduction to for some small , maybe a “biased” version of their code could work as well.)

The following statement, if true, demonstrates one of the challenges in proving the soundness of KM construction: Recall that the KM boundary test takes a function and checks if where and have standard Gaussian coordinates that are each correlated for some . Their intended solution for will fail the test with probability . Prove that there is a function that passes the test with for some but such that for every constant and function of the form where a polynomial of degree at most , .

Show that there are some constant and , such that the degree -SOS program yields an approximation to the *Sparsest Cut* problem. If you think this can’t be done, even showing that the algorithm doesn’t beat would be very interesting.

Give a polynomial-time algorithm that for some sufficiently small , can (approximately) recover a planted -sparse vector inside a random subspace of dimension . That is, we choose as random Gaussian vectors, and the algorithm gets an arbitrary basis for the span of . Can you extend this to larger dimensions? Can you give a quasipolynomial time algorithm that works when has dimension ? Can you give a quasipolynomial time algorithm for certifying the *Restricted Isometry Property* (RIP) of a random matrix?

Improve the dictionary learning algorithm of [Barak-Kelner-Steurer] (in the setting of constant sparsity) from *quasipolynomial* to *polynomial* time.

(Suggested by Prasad Raghavendra.) Can SDP relaxations simulate local search?

While sum of squares SDP relaxations yield the best known approximations for CSPs, the same is not known for bounded degree CSPs. For instance, MAXCUT on bounded degree graphs can be approximated better than the Goemans-Willamson constant via a combination of SDP rounding and local search. Here local search refers to improving the value of the solution by locally modifying the values. Show that for every constant , there is some such that rounds of SOS yield an approximation for MAXCUT on graphs of maximum degree . Another problem to consider is maximum matching in 3-uniform hypergraphs. This can be approximated to a 3/4 factor using only local search (no LP/SDP relaxations), and some natural relaxations have a 1/2 integrality gap for it. Show that for every , rounds of SOS give a approximation for this problem, or rule this out via an integrality gap.

(Suggested by Ryan O’Donnell) Let be the vertex graph on where we connect every two vertices such that their distance (mod ) is at most for some constant . The set of vertices with least expansion is an arc. Can we prove this with an SOS proof of constant (independent of ) degree? For every there is a such that if we let be the graph with vertices corresponding to where we connect vertices if their Hamming distance is at most , then for every subsets of satisfying , there is an edge between and . Can we prove this with an SOS proof of constant degree?

The following problems are not as well-defined, but this does not mean they are less important.

Find more problems in the area of unsupervised learning where one can obtain an efficient algorithm by giving a proof of identifiability using low degree SOS.

The notion of pseudo-distributions gives rise to a computational analog of Bayesian reasoning about the knowledge of a computationally-bounded observer. Can we give any interesting applications of this? Perhaps in economics? Or cryptography?

**SOS, Cryptography, and . **It sometimes seems as if in the context of combinatorial optimization it holds that “”, or in other words that all proof systems are automatizable. Can the SOS algorithm give any justification to this intuition? In contrast note that we do not believe that this assertion is actually true in general. Indeed, many of our candidates for public key encryption (though not all— see discussion in [Applebaum,Barak, Wigderson]) fall inside (or ). Can SOS shed any light on this phenonmenon? A major issue in cryptography is (to quote Adi Shamir) the lack of diversity in the “gene pool” of problems that can be used as a basis for public key encryption. If quantum computers are built, then essentially the only well-tested candidates are based on a single problem— Regev’s “Learning With Errors” (LWE) assumption (closely related to various problems on integer lattices). Some concrete questions along these lines are:

Find some evidence to the conjecture of Barak-Kindler-Steurer (or other similar conjectures) that the SOS algorithm might be optimal even in an *average case* setting. Can you find applications for this conjecture in cryptography?

Can we use a conjectured optimality of SOS to give *public key encryption schemes*? Perhaps to justify the security of LWE? One barrier for the latter could be that breaking LWE and related lattice problems is in fact in or .

Understand the role of *noise* in the performance of the SOS algorithm. The algorithm seems to be inherently noise robust, and it also seems that this is related to both its power and its weakness– as is demonstrated by cases such as solving linear equations where it cannot get close to the performance of the Gaussian elimination algorithm, but the latter is also extremely sensitive to noise.

Can we get any formal justifications to this intuition? What is the right way to define noise robustness in general? If we believe that the SOS algorithm is optimal (even in some average case setting) for noisy problems, can we get any quantitative predictions to the amount of noise needed for this to hold? This may be related to the question above of getting *public key cryptography* from assuming the optimality of SOS in the average case (see Barak-Kindler-Steurer and Applebaum-Barak-Wigderson).

Related to this: is there a sense in which SOS is an optimal noise-robust algorithm or proof system? Are there natural stronger proof systems that are still automatizable (maybe corresponding to other convex programs such as hyperbolic programming, or maybe using a completely different paradigm)? Are there natural noise-robust algorithms for combinatorial optimizations that are *not* captured by the

SOS framework? Are there natural stronger proof systems than SOS (even non

automatizable ones) that are noise-robust and are stronger than SOS for natural combinatorial optimization problems?

Can we understand better the role of the *feasible interpolation property* in this context?

I have suggested that the main reason that a “robust” proof does not translate into an SOS proof is by use of the probabilistic method, but this is by no means a universal law and getting better intuition as to what types of arguments do and don’t translate into low degree SOS proofs is an important research direction. Ryan O’Donnell’s problems above present one challenge to this viewpoint. Another approach is to try to use techniques from derandomization such as use of additive combinatorics or the Zig-Zag product to obtain “hard to SOS” proofs. In particular, is there an SOS proof that the graph constructed by Capalbo, Reingold, Vadhan and Wigderson (STOC 2002) is a “lossless expander” (expansion larger than )? Are there SOS proofs for the pseudorandom properties of the condensers we construct in the work with Impagliazzo and Wigderson (FOCS 2004, SICOMP 2006) or other constructions using additive combinatorics? I would suspect the answer might be “no”. (Indeed, this may be related to the planted clique question, as these tools were used to construct the best known Ramsey graphs.)

]]>

**What?**

On Thursday 09/18/2014, an urgent meeting was announced for all but a few in MSR-SV. The short meeting marked the immediate closing of the lab. By the time the participants came back to their usual building, cardboard boxes were waiting for the prompt packing of personal items (to be evacuated by the end of that weekend). This harsh style of layoffs was one major cause for shock and it indeed seemed unprecedented for research labs of this sort. But I find the following much more dramatic: Microsoft, like many other big companies, frequently evaluates its employees. A group of researchers that were immensely valuable according to Microsoft’s own metric just a couple of months before were thrown out to the hands of Microsoft’s competitors that were more than happy to oblige. Similarly, previously valued research projects were carelessly lost (quite possibly to be picked up by others). Excellence as defined by Microsoft did not protect you, impact did not protect you (among the positions eliminated were researchers that saved Microsoft ridiculously large sums of money, enough to pay for the entire lab for many years). Since Microsoft is publicly claiming “business as usual” (which should mean that the evaluation metric didn’t change), and since Microsoft was performing a relatively moderate force reduction (of roughly 5% of its non-Nokia workforce), I still find it all very hard to understand.

**Why MSR-SV and Why not?**

It is my opinion that no substantial explanation for the closing was given by Microsoft’s representatives to the general public and (as far as I have been told) to current Microsoft employees. In the absence of reliable official explanation, rumors and speculations flourished. What should be made absolutely clear is that **MSR-SV was not closed for lack of impact.** The lab had huge impact in all dimensions including impact measured in dollars.

It is true that some cannot understand how the academic-style model of MSR-SV could be beneficial for a company. But it seems amateurish to base business decisions on perception rather than reality. Indeed, previous management of MSR and Microsoft resisted pressures from outside of Microsoft to change MSR. The current management seems to be changing course.

This is indeed my own speculation – MSR is changing its nature and therefore chose to close the lab that embodied in the purest form what MSR is moving away from, sending a strong internal and external signal. I don’t know that this is the case, but any other explanation I heard characterizes parts of the management of MSR and Microsoft as either incompetent or malicious. There is every reason to believe that these are all bright individuals, and that the decision was carefully weighed (taking into account all the obvious internal and external consequences). I only wish they would own up to it.

**Don’t Call it the “MSR Model “**

There was a lot of discussion about the MSR model vs. the model of other industrial research labs. This is somewhat misguided: MSR is very large and hosts a lot of models. This is absolutely fine – a company like Microsoft has the need for all sorts of research, and different talents need different outlets. But this also means that the claim that “nothing really happened, we still have about 1000 PhDs” is not the whole truth. There is no other MSR-SV in MSR. There are of course other parts of MSR that share the MSR-SV philosophy, but they are now more isolated than before.

**Empower Researchers and Engineers Alike**

I encourage you to read Roy Levin’s paper on academic-style industrial labs. This is a time-tested formula which Roy, with his unique skills and vision and his partnership with Mike Schroeder, managed to perfect over the years . Microsoft’s action takes nothing off Roy’s success. See Roy’s addendum below giving due credit to Bob Taylor.

If I want to summarize the approach, I would do it in two words: empower researchers. Empower them to follow their curiosity and to think long term. Empower them to collaborate freely. Empower them to stay an active part of the scientific community. When brilliant researchers with a desire to impact have such freedom to innovate, then great things happen (as proven by MSR-SV).

On the other hand, to be fair, other companies seem to be much better than Microsoft in empowering engineers to innovate and explore. This is wonderful and I admire these companies for it. In fact, the impediment for even more impact by MSR-SV was not the lack of incentive for researchers to contribute (we were highly motivated), but rather the incentive structure of some of the product groups we interacted with in which innovation was not always sufficiently rewarded.

**The Cycle of Industry Labs. **

Different companies need different things out of their research organizations (and some are successful without any research organization to speak of). I have no shred of criticism of other models, as long as companies are honest about them when recruiting employees.

I would still argue that the success of MSR-SV is evidence that “Roy’s model” is extremely powerful. This model facilitated impact that would have been impossible in other models.

Some companies cannot afford such long term investment but other companies cannot afford not making such an investment. Indeed, in many of the companies I talked with there is a growing understanding of the need for more curiosity-driven long-term research.

I am reminded that when AT&T Research (of which I was a part) suffered brutal layoffs, people mourned the end of “academic-style research” in industry. This didn’t happen then and it will not happen now, simply since the need for this style of research exists.

**Job Security is the Security to Find a New Job**

Given the above, it should be clear that being applied or even having an impact does not guarantee job security in industry. I saw it in the collapse of AT&T research many years ago. People that did wonderful applied work were the first to go (once corporate decided to change its business model). Predicting the future in industry is impossible, and there are many examples. I do not trust the prophecies of experts (they are mainly correct in retrospect). I also think that industry employees should avoid the danger of trusting the internal PR. Even when the “internal stories” are the result of good intentions (rather than cynical manipulation), they do not hold water when the time comes. If I blame myself for anything, it is only for taking some of the MSR management statements at face value.

So what should industry employees do? First, decide if you can live with uncertainty. Then, make sure that your current job serves your next job search (whether in industry or academia). Don’t trust management to take care of you, nor should you wait for Karma to kick in. This in particular means that not every industry job serves every career path and that one should be vigilant in preserving external visibility – whether it is via publishing, open source, or just contribution to projects that are understood externally.

**Academia’s Reaction**

Finally one point about the open letter to Microsoft from a group of outstanding academics. This letter was not about the group of alumni MSR-SV employees. It is true that individuals whose career path is inconsistent with other industry jobs were put in a difficult position. But we will all land on our feet eventually, and we have no justification to indulge in self-pity. The letter was about the unwritten contract between academia and MSR which have arguably been broken. It was about understanding in which direction MSR is going, and accordingly what the new form of collaboration possible between academia and MSR can be. It was an attempt to start a discussion and it is a shame it was not answered more seriously.

——–

Addendum by Roy Levin:

I want to add a small but important clarification to Omer’s post. The research environment of MSR Silicon Valley, which Mike Schroeder and I had the privilege of founding and managing, was inspired by Bob Taylor, for whom both of us worked at Xerox PARC and DEC SRC. The paper I wrote about research management, which Omer cited, describes how we applied Taylor’s deep understanding of research management in MSR Silicon Valley. (Indeed, my paper is chiefly an elaboration of a short paper Taylor co-authored in 2004.) Thus, MSR Silicon Valley was founded on proven models for corporate research, and they were not dramatically different from the broader MSR model that had been in place since Rick Rashid started MSR in 1991. Mike and I refined and reinterpreted what we had learned from Bob Taylor in previous labs (which Omer generously calls “perfecting” the model). Bob was the master, and we were his disciples.

]]>

——————–

This blog post is a report about a special 80 min session on the future shape of STOC/FOCS, organized by David Shmoys (IEEE TCMF Chair) and Paul Beame (ACM Sigact Chair) on the Saturday before FOCS in Philadelphia. Some 100+ people attended.

The panelists: Boaz Barak, Tim Roughgarden, and me. Joan Feigenbaum couldn’t attend but sent a long email that was read aloud by David. Avi Wigderson had to cancel last minute.

**For those who don’t want to read further **(spoiler alert): The panelists all agreed about the need to create an annual week-long event to be held during a convenient week in summer, which would hopefully attract a larger crowd than STOC/FOCS currently do. The decision was to study how to organize such an annual event, likely starting June 2017. Now read on.

Sole ground rule from David and Paul was: **no discussion** of open access/copyright, nor of moving STOC/FOCS out of ACM/IEEE. (Reason: these are orthogonal to the other issues and would derail the discussion.)

**Boaz and Omer’s proposal in a nutshell (details are ****here****):** Fold STOC/FOCS into this annual event. Submissions and PC work for these two would work just as now with the same timetable. Actual presentations would happen at this annual event. But the annual event would be planned by a third PC that would decide upon how much time to allocate to each paper’s presentation—*not all papers would be treated equally*. This PC would also plan a multi-day program of plenary talks —invited speakers, and selected papers drawn from theory conferences of the past year including STOC/FOCS. (Some people expressed discomfort with creating different classes of STOC-FOCS papers. See Boaz and Omer’s blog post for more discussion, and also my proposal below.)

**Tim’s ideas: **It’s very beneficial to have such a mega event in some form. Logistics may be formidable and need discussing, but it would be good for the field to have a single clearing point for major results and place to catch up with others (for which it is important that the event is attractive enough to draw everybody). His other main point: the event should give a large number of people “something to do” by which he meant “something to present.” (Could be poster presentations, talks, workshops, etc.) This helps draw people into the event rather than make them feel like bystanders.

**Joan’s email: **Started off by saying that we should not be afraid of experimentation. Case in point: She tried a 2-tier PC a few years ago and while many people railed against it, nobody could pinpoint any impact on the quality of the final program. She thinks STOC/FOCS currently focus too much on technical wizardry. While this has its place, other aspects should be valued as well. With this preamble, her main proposal was: There should be an inclusive annual mega event that showcases good work in many different aspects of TCS , possibly trading off some mathematical depth with inclusiveness and intellectual breadth. Secondary proposal: to fix somehow the problem of incomplete papers. (She mentioned the VLDB model where the conference is also a journal.) Interestingly, I don’t detect such a crisis in TCS today; most people post full versions on arxiv. I do support looking at the VLDB model, but for a different reason: it’s our journal process that seems broken.

**My proposal:** Though it was a panel discussion I prepared powerpoint slides, which are available here. My proposal has evolved from my earlier blog post which turned into a B. EATCS article. A guiding principle: *“Add rather than subtract; build upon structures that already work well.”* The STOC/FOCS PC process works well with efficient reviewing and decision-making—though not everybody is happy with the decisions themselves. But the journal process is sclerotic and possibly broken, so proposals (such as Fortnow’s) that replace conferences with journals seems risky. Finally, let’s design any new system to maximize buy-in from our community.

So here’s the plan in brief: Keep STOC/FOCS as now, possibly increasing the number of acceptances to 100-ish, which still fit in 3 days with 2 parallel sessions but no plenary talks. (“If you are content with your current STOC/FOCS, you don’t need to change anything.”) Then add 3-4 days of activity around STOC including workshops, poster sessions, and lots of plenary sessions. Encourage other theory conferences to co-locate with this event.

See my article and slides for further details.

**A Few Meta Points that I made.**

Here are a few meta points that I made, which are interrelated:

**We are a part of computer science.** I hope to be a realist here, not controversial. Our work involves and touches upon other disciplines: math, economics, physics, sociology, statistics, biology, operations research, etc. But most of our students will find jobs in CS departments or industrial labs, and practically none in these allied disciplines. CS is also the field (biology possibly excepted) with most growth and new jobs in the foreseeable future. Our system should be most attuned with the CS way of doing things. To shoot down an obvious straw man, we should avoid the Math mode of splitting into small sub-communities and addressing papers and research to a small group of experts. Our papers and talks should remain comprehensible and interesting for a broad TCS audience, and a significant fraction of our collective work should look interesting to a general CS audience. (Joan’s email made a similar point about the danger of what she calls “mathematization.”)

**Senior people in TCS have been dropping out of the STOC/FOCS system.** I am, at 46 years of age, a regular attendee, but most people my age and older aren’t. I have talked to them, and they often feel that STOC/FOCS values specialization: technical improvements to past work, and that sort of thing. Any reform should try to address their concerns, and I hope the mega event will bring them back. (My advice to these senior people: if you want to change STOC/FOCS, be willing to serve on the PC, and speak up.)

**Short papers are better. **There’s a strong trend towards preferring long papers with full proofs. I consider this the “Math model” because it rewards research topics and presentation aimed at a handful of experts. I favor an old-fashioned approach that’s still in fashion at top journals like *Nature* and *Science*: force authors to explain their ideas in 8 double-column pages (or some other reasonable page limit). *No appendices allowed*, though reviewers who need more details should be able to look up a time-stamped detailed version on arxiv. In other words, use arxiv to the fullest, but force authors to also write clean, self-contained and terse versions. This is my partial answer to the question “What is the value added by conferences?” (NB: I don’t sense a crisis of incorrect papers in STOC/FOCS right now. Plus it’s not the end of the world if a couple percent of conference papers turn out to be wrong; Science and Nature have a worse track record and are doing OK!)

Towards the end of the session David and Paul solicited further ideas from the audience. Sensing general approval of the June mega event, they announced that they will further study this idea, and possibly implement it starting 2017, without waiting for other theory conferences to collocate. Paul pointed to logistical hurdles, which necessitate careful planning. David observed that putting the spotlight on STOC may cause FOCS to wither away. Personally, I think FOCS will do fine and may even find a devoted audience of those who prefer a more intimate event.

So dear readers, please comment away with your reactions and thoughts. This issue creates strong opinions, but let’s keep it civilized. If you have a counter proposal, please put it on the web and send us the link; Paul and David are following this debate.

ps: I am skeptical of the value of anonymous comments and will tend to ignore them (and hope that the other commenters will too).

]]>

]]>

1) Applied mathematicians are not nearly as mathematical as TCS researchers. By which I mean, the careful formal problem statements, the rigorous definitions, the proofs of correctness for an algorithm, the analysis of the use of resources, the definition of resources, etc. are not nearly as developed nor as important to applied mathematicians.

Here are two examples on the importance of clear, formal problem statements and the definition of resources. There a number of different ways to formulate sparse approximation problems, some instantiations are NP-complete and some are not. Some are amenable to convex relaxation and others aren’t. For example, exact sparse approximation of an arbitrary input vector over an arbitrary redundant dictionary is NP-complete but if we draw a dictionary at random and seek a sparse approximation of an arbitrary input vector, this problem is essentially the compressed sensing problem for which we do have efficient algorithms (for suitable distributions on random matrices). Stated this way, it’s clear to TCS what the difference is in the problem formulations but this is not the way many applied mathematicians think about these problems. To the credit of the TCS community, it recognized that randomness is a resource—generating the random matrix in the above example costs something and, should one want to design a compressed sensing hardware device, generating or instantiating that matrix ”in hardware” will cost you resources beyond simple storage. Pseudo-random number generators are a central part of TCS and yet, for many applied mathematicians, they are a small implementation detail easily handled by a function call. Similarly, electrical engineers well-versed in hardware design will use a linear feedback shift register (LFSR) to build such random matrices without making any ”use” of the theory of pseudo-random number generators. The gap between the mathematics of random matrices and the LFSR is precisely where pseudo-random number generators, small space constructions of pseudo-random variables, random variables with limited independence, etc. fit, but forming that bridge and, more importantly, convincing both sides that they need a bridge rather than a simple function call or a simple hardware circuit, is a hard thing to do and not one the TCS community has been successful at. (Perhaps it’s not even something they are aware of.)

2) Many TCS papers, whether they provide algorithms or discuss models of computation that could/should appeal to applied mathematicians, are written or communicated in a way that applied mathematicians can’t/don’t/won’t understand. And, sometimes, the problems that TCS folks address do not resonate with applied mathematicians because they are used to asking questions differently.

My biggest example here is sparse signal recovery as done by TCS versus compressed sensing. For TCS, it is very natural to ask to ** design** both a measurement matrix and a decoding algorithm so that the algorithm returns a good approximation to the sparse representation of the measured signal. For mathematicians, it is much more natural to ask what conditions are

3) Finally, for many applied mathematicians, computation is a means to an end (e.g., solve the problem, better, faster) as opposed to an equal component of the mathematical problem, one to be studied rigorously for its own sake. And, for a number of TCS researchers, actually making progress on a complicated, real-world problem takes a back seat to the intricate mathematical analysis of the computation. In order for both communities to talk to one another, it helps to understand what matters to each of them.

I think that Michael Mitzenmacher’s response to Boaz’s post is similar to the points in the last point when he says “I think the larger issue is the slow but (over long periods) not really subtle shift of the TCS community away from algorithmic work and practical applications.” Although, I am not sure either model is better. TCS research can be practical, applied math isn’t as useful as we’d like to think, and solving a problem better, faster can be done only after thorough, deep understanding, the type that TCS excels at.

]]>

Here is one way this FOCS *has* evolved: FOCS now has an app – you can use it to keep track of the schedule, add personal reminders to the talks you want to see, and more. In particular I am thankful to Nadia Heninger and Aaron Roth for the restaurant recommendations – apparently that area of Philadelphia has an amazing selection of great places. (If you don’t see me around during the talks, now you know why…)

To get the app, install the “Guidebook” app on your phone and then search for the “**FOCS 2014**” guide. While the Guidebook app prompts you to sign up, this is not mandatory, and you can use most of the features without it, though some features, such connecting with other participants, or seeing your reminders and schedule through web access, require it. You may want to download the app and the guide prior to the trip.

I hope to see many of you starting Saturday in Philadelphia!

Boaz

]]>

————————————————————————————

The debate about the future of FOCS/STOC has been long and heated. A wide range of criticism (containing at times contradicting complaints) was answered with one simple truth: FOCS/STOC have played and still plays an invaluable role for the TOC community. Indeed, the authors of this proposal have a deep connection to FOCS/STOC. Nevertheless, though often exaggerated, we do acknowledge the validity of many of the concerns regarding FOCS/STOC. As the community evolves, we feel the need to evolve its central meeting place. So while not broken, and not in an urgent need to be fixed, we put forth a proposal to improve (perhaps revive) FOCS/STOC.

FOCS/STOC play a dual role in our community: as both publication venues and as meeting places. In their former role, FOCS/STOC has been incredibly successful, every year many of the best papers in TOC appear in these conference, and FOCS/STOC papers (including recent ones) have led to major awards including ACM dissertation awards, the Grace Murray Hopper award, the Rolf Nevanlinna prize, the MacArthur fellowship and the Turing award.

Thus, while undoubtedly FOCS/STOC are not perfect publication venues, our thesis is that their main shortcomings are as meeting places. Indeed, attendance has been flat over the last decade or so, even as the field has seen significant growth and specialized workshops (such as those at the Simons Institute) often draw an audience half the size of FOCS/STOC. In particular we feel that while FOCS/STOC provides an opportunity for social meetings and small-group collaborations, it falls short in terms of the wider-range exchange of ideas (specifically, the ideas in the papers published in these conferences). We believe it is possible to revise STOC/FOCS to make them a significantly more attractive event (in a sense, a “must-attend” item on every theoretician’s schedule), and a better forum for exchanging ideas across subfields of TOC, while preserving their nature as publication venues (in particular, no dramatic changes in the number of accepted papers, nor in the selection process).

The crux of our proposal is a single combined FOCS/STOC meeting that will be longer (and scheduled appropriately with respect to the academic year) and that will be specifically designed to allow the spread of ideas of appeal to the general community (thus countering the fragmentation of the community) as well as forums for sub-communities to exchange more specialized ideas. While many details can be open to tweaking, in a nutshell we suggest to have an annual weeklong “Theory Festival”. This theory festival would contain presentations of the STOC and FOCS papers, as well as many other activities, including invited talks, tutorials, mini-courses, workshops, and more. The organizers of the theory festival, which would be logically separate from the FOCS/STOC PC’s, would take as input the paper selection by the PCs, but would have considerable latitude in using this input to assemble an attractive program, including a mix of plenary and highly parallel sessions, or any other way they see fit.

**Our Proposal in More Detail**

The core of our proposal is to collocate FOCS and STOC (and possibly additional events) into a single somewhat longer event at an appropriate time of the year (for example, after the end of the academic year). At least at first, the two PCs of FOCS and STOC will operate similarly to their current operation. In particular, a list of accepted papers (including links to online versions) will be made public in a timely fashion. In addition, a separate organizing committee will be responsible to the selection and scheduling of the joint event. The committee will have representation from the two PCs but will have a separate agenda: to create the most effective program, optimizing for the TOC audience rather than the authors. In particular, it is natural to expect that part of the program will be in a plenary session whereas the rest will be organized as a collection of sub-conferences/workshops in multiple parallel sessions.

Attendees of the joint conference should get an opportunity to catch up on the most exciting developments in TOC (research trends, results and techniques) that are ready for general TOC audience as well as more complete perspective in their specialized area of research. For this purpose, in either the plenary session or the parallel sessions, the organizing committee will not be limited to talks by authors of FOCS/STOC accepted papers. Important results that appeared elsewhere should be represented. In addition, surveys of collections of papers may be at times more effective than talks on individual results.

Let us emphasize that at this point, we are suggesting merely to change the event and **make no changes to the paper selection process**. That is, there will be two separate FOCS and STOC PC’s that will work on a similar schedule as they do today, where at the end of each PC’s process, the list of accepted papers and the electronic proceedings will be published. The only difference would be that the paper presentations would be deferred to the annual “Theory Festival” that is organized by a third committee. Of course, we are not ruling out making changes to the selection process as well. In fact we believe that decoupling to some extent the event from the selection might open some possibilities for improving the latter that would not be otherwise possible.

**Advantages, Concerns and Possible Future Extensions**

- As mentioned, the only change necessitated by this proposal is to the meetings, but FOCS/STOC can keep their character as publication venues (both for the authors as well as for external committees that evaluate TOC researchers). On the other hand, the organizing committee will be free to optimize the meetings for the audience experience and for the exchange of ideas. The meetings could also evolve and reflect developments in TOC as a growing research field.
- FOCS/STOC PCs will not need to select papers in multiple tiers. In addition, the organizing committee will also be free from choosing the “strongest” papers. The plenary session (while hopefully a prestigious talk opportunity) will not be intended as an award for papers (as again, the focus is on the audience not the authors).
- The scheduling choices will be intended to be “ephemeral”. The organizing committee will be free to use non-scientific considerations, including diversity of areas or speakers, in making these choices. It can be conveyed to the speakers that all FOCS/STOC papers were equally selected by the PC, and it would be “poor form” to list in the publication list on your CV or webpage the fact that the talk was presented in one session rather than the other. (Of course one can worry that people will still do that, but the risk in alienating potential evaluators will probably outweigh any benefit, and in any case we believe we should not make our events unattractive just to protect against the possibility of abuse.)
- The quantity and high quality of papers accepted in FOCS and STOC together, go a long way towards the effect desired by a federated theory conference (which may not be easy to obtain otherwise; also FOCS/STOC together may provide enough “critical mass” to encourage other conferences to colocate).
- A single event could significantly increase attendance. In particular, it could be easier for researchers with limited travel budget or other travel constraints (e.g. young children) to keep part of the community. Moreover, by “network effects”, with people knowing that this is the place they will meet most theorists, it may well be that the number of attendees in this event would be larger than the union of STOC and FOCS.
- Sub-areas that have grown distant from FOCS/STOC could be welcomed back. As a first step, they could be incorporated as invited talks without asking authors to give up on their more specialized venues. With time one may hope that more papers from these sub areas will be submitted to FOCS/STOC. Similarly, papers submitted to venues with inconsistent publication rules (e.g., some ECON journals) could be easily incorporated in the major meeting of the TOC community.
- A major concern is of increased fragmentation of the community due to additional parallel sessions in the non-plenary part of the program. We argue that the effect of a substantial part of the program (say half) being in a single session more than compensate for this effect.
- The organization committee will also have the flexibility of having plenary survey talks to expose attendees to ideas from outside their area. Furthermore, attendees that used to focus on a few areas of interest (which characterize in our opinions most of the attendees) are more likely to be exposed to talks outside of their area, given the more restrictive filtering offered by the plenary session.
- Some areas (for example Cryptography and Quantum Computing) are more likely to see increased attendance in talks, as at least some of the papers will be in the plenary session, and in any case, we believe there will be increased attendance over the current state. An important concern is the attention to papers that are in more isolated areas, and are not of wide enough appeal to appear in the plenary session. Care should be given to such papers in the program design. It is important to note that these kinds of papers suffer from lack of audience in the current system as well.
- One could worry that by moving to an annual publication cycle, papers presented will be more “stale” than the current model. We agree that this is a concern. However, we posit that FOCS and STOC are primarily meant to educate researchers about progress
*outside their immediate area*. While even few months could be too long a wait to hear about the latest improvement on the problem you’re working on (which is one more reason to be grateful to the arXiv), waiting 6 months to a year to hear a (perhaps more mature and well digested) talk about exciting results in another area may well be acceptable (note that even specialized workshops find value in presentations of papers that are one or two-year old). The organizing committee will have considerable latitude in selecting the program and in particular, if the conferences contained a sequence of papers that improved on one another, it may decide to schedule a single talk that surveys all these papers.

**On changes to FOCS/STOC as a publishing venue**

We acknowledge that, despite their success, many of the critiques of FOCS/STOC are as a publication venue, including suggestions that they have become too selective, or not selective enough, papers are too specialized, or too shallow, that the deadline-driven process yield “half-baked” papers, and more. These issues deserve discussion, but we note that our proposal is largely independent of any modifications to the selection process to address those, and we believe would yield a more attractive event regardless. Moreover, as we mentioned, decoupling the selection from the event naturally allows some modifications such as selecting more papers, or having more deadlines, that may be infeasible in the current model.

]]>

I am teaching a seminar series at MIT with the title above, and thought I would post the introduction to the course here. For the complete notes on the first lecture (and notes of future lectures), please see the course webpage, where you can also sign up for the mailing list to get some future updates. While the SOS algorithm is widely studied and used for a great many applications (see for example the courses of Parrilo and Laurent and the book of Lasserre) , this seminar will offer a different perspective of theoretical computer science. One tongue-in-cheek tagline for this course can be

Rescuing the Sum-of-Squares algorithm from its obscurity as an algorithm that keeps planes up in the sky into a way to refute computational complexity conjectures.

Consider the following questions:

- Do we need a different algorithm to solve every computational problem, or can a single algorithm give the best performance for a large class of problems?
- In statistical physics and other areas, many people believe in the existence of a
*computational threshold*effect, where a small change in the parameters of a computational problems seems to lead to a huge change in its computational complexity. Can we give rigorous evidence for this intuition? - In machine learning there often seem to be tradeoffs between sample complexity, error probability, and computation time. Is there a way to map the curve of this tradeoff?
- Suppose you are given a 3SAT formula with a unique (but unknown) satisfying assignment . Is there a way to make sense of statements such as “The probability that is ” or “The entropy of is “?
- (even though of course information theoretically is completely determined by , and hence that probability is either or and has zero entropy).
- Is Khot’s Unique Games Conjecture true?

If you learn the answers to these questions by the end of this seminar series, then I hope you’ll explain them to me, since I definitely don’t know them. However we will see that, despite these questions a-priori having nothing to do with Sums of Squares, that the SOS algorithm can yield a powerful lens to shed light on some of those questions, and perhaps be a step towards providing some of their answers.

Theoretical computer science studies many computational models for different goals. There are some models, such as bounded-depth (i.e. ) circuits, that we can prove unconditional lower bounds on, but do not aim to capture all relevant algorithmic techniques for a given problem. (For example, we don’t view the results of Furst-Sax-Sipser and Håstad as evidence that computing the parity of bits is a hard problem.) Other models, such as bilinear circuits for matrix multiplication, are believed to be strong enough to capture all known algorithmic techniques for some problems, but then we often can’t prove lower bounds on them.

The *Sum of Squares (SOS)* algorithm (discovered independently by researchers from different communities including Shor, Parrilo, Nesterov and Lasserre) can be thought of as another example of a concrete computational model. On one hand, it is sufficiently weak for us to know at least some unconditional lower bounds for it. In fact, there is a sense that it is weaker than , since for a given problem and input length, SOS is a *single algorithm* (as opposed to an exponential-sized family of circuits). Despite this fact, proving lower bounds for SOS is by no means trivial, even for a single distribution or instances (such as a random graph) or even a single instance. On the other hand, while this deserves more investigation, it does seem that for many interesting problems, SOS does encapsulate all the algorithmic techniques we are aware of, and that there is some hope that SOS is an ** optimal algorithm** for some interesting family of problems, in the sense that no other algorithm with similar efficiency can beat SOS’s performance on these problems. (I should note that we currently have more in the way of intuitions than hard evidence for this, though the unique games conjecture, if true, would imply that SOS is an optimal approximation algorithm for every constraint satisfaction problem and many other problems as well; that said, the SOS algorithm also yields the strongest evidence to date that the UGC may be false…)

The possibility of the existence of such an *optimal algorithm* is very exciting. Even if at the moment we can’t hope to prove its optimality unconditionally, this means that we can (modulo some conjectures) reduce analyzing the difficulty of a problem to analyzing a single algorithm, and this has several important implications. For starters, it reduces the need for creativity in designing the algorithm, making it required only for the algorithm’s *analysis*. In some sense, much of the progress in science can be described as attempting to automate and make routine and even boring what was once challenging. Just as we today view as commonplace calculations that past geniuses such as Euler and Gauss spent much of their time on, it is possible that in the future much of algorithm design, which now requires an amazing amount of creativity, would be systematized and made routine. Another application of optimality is automating *hardness results*— if we prove the optimal algorithm *can’t* solve a problem X then that means that X can’t be solved by any efficient algorithm.

Beyond than just systematizing what we already can do, optimal algorithms could yield qualitative new insights on algorithms and complexity. For example, in many problems arising in statistical physics and machine learning, researchers believe that there exist *computational phase transitions*— where a small change in the parameter of a problems causes a huge jump in its computational complexity. Understanding this phase transitions is of great interest for both researchers in these areas and theoretical computer scientists. The problem is that these problems involve *random inputs* (i.e., *average case complexity*) and so, based on current state of art, we have no way of proving the existence of such phase transitions based on assumptions such as . In some cases, such as the planted clique problem, the problem has been so well studied that the existence of a computational phase transition had been proposed as a conjecture in its own right, but we don’t know of good ways to reduce such reductions to one another, and we clearly don’t want to have as many conjectures as there are problems. If we assume that an algorithm is optimal for a class of problems, then we can prove a computational phase transition by analyzing the running time of this algorithm as a function of the parameters. While by no means trivial, this is a tractable approach to understanding this question and getting very precise estimates as to the location of the threshold where the phase transition occurs. (Note that in some sense the existence of a computational phase transition implies the existence of an optimal algorithm, since in particular it means that there is a single algorithm such that beating ‘s performance by a little bit requires an algorithm taking much more resources.)

Perhaps the most exciting thing is that an optimal algorithm gives us a *new understanding* of just what is it about a problem that makes it easy or hard, and a new way to look at efficient computation. I don’t find explanations such as “Problem A is easy because it has an algorithm” or “Problem B is hard because it has a reduction from SAT” very satisfying. I’d rather get an explanation such as “Problem A is easy because it has property P” and ”Problem B is hard because it doesn’t have P” where P is some meaningful property (e.g., being convex, supporting some polymorphisms, etc..) such that every problem (in some domain) with P is easy and every problem without it is hard. For that, we would want an algorithm that will solve all problems in P and a proof (or other evidence) that it is optimal. Such understanding of computation could bear other fruits as well. For example, as we will see in this seminar series, if the SOS algorithm is optimal in a certain domain, then we can use this to build a theory of “computational Bayesian reasoning” that can capture the “computational beliefs” of a bounded-time agent about a certain quantity, just as traditional Bayesian reasoning captures the beliefs of an unbounded-time agent about quantities on which it is given partial information.

I should note that while much of this course is very specific to the SOS algorithms, not all of it is, and it is possible that even if the SOS algorithm is superseded by another one, some of the ideas and tools we develop will still be useful. Also, note that I have deliberately ignored the question of *what* family of problems would the SOS be optimal for. This is clearly a crucial issue— every computational model (even ) is optimal for *some* problems, and every model falling short of general polynomial-time Turing machines would not be optimal for *all* problems. It definitely seems that some algebraic problems, such as integer factoring, have very special structure that makes it hard to conjecture that any generic algorithm (and definitely not the SOS algorithm) would be optimal for them. (See also my previous blog post on this topic.) The reason I don’t discuss this issue is that we still don’t have a good answer for it, and one of the research goals in this area is to understand what should be the right *conjecture* about optimality of SOS. However, we do have some partial evidence and intuition, including those arising from the SOS algorithm’s complex (and not yet fully determined) relation to Khot’s Unique Games Conjecture, that leads us to believe that SOS could be an optimal algorithm for a non-trivial and interesting class of problems.

In this course we will see:

- A description of the SOS algorithm from different viewpoints— the traditional semidefinite programming/convex optimization view, as well as the proof system view, and the “pseudo-distribution” view.
- Discussion of positive results (aka “upper bounds”) using the SOS algorithms to solve graph problems such as sparsest cut, and problems in machine learning.
- Discussion of known negative results (aka “lower bounds” / “integrality gaps”) for this algorithm.
- Discussion of the interesting (and not yet fully understood) relation of the SOS algorithm to Khot’s
*Unique Games Conjecture*(UGC). On one hand, the UGC implies that the SOS algorithm is*optimal*for a large class of problems. On the other hand, the SOS algorithm is currently the main candidate to refute the UGC.

The SOS algorithm is an algorithm for solving a computational problem. Let us now define what this problem is:

Definition 1Apolynomial equationis an equation of the form (in which case it is called an

inequality) or an equation of the form (in which case it is called anequality) where is a multivariate polynomial

mapping to . The equation (resp. ) issatisfiedby if (resp. ).A set of polynomial equations is

satisfiableif there exists an that satisfies all equations in .The

polynomial optimizationproblem is to output, given a set of polynomial equations as input, either an satisfying all equations in or a proof that is unsatisfiable.

(Note: throughout this seminar we will ignore all issues of numerical accuracy— assume the polynomials always have rational coefficients with bounded numerator and denominator, and all equalities/inequalities can be satisfied up to some small error .)

Here are some examples for polynomial optimization problems:

- Linear programming If all the polynomials are
*linear*then this is of course linear programming that can be done in polynomial time. - Least squares If the equations consist of a single quadratic then this is the least squares algorithm. Similarly, one can capture computing eigenvalues by two quadratics.
- 3SAT Can encode 3SAT formula as degree 3 polynomial equations: the equation is equivalent to .

The equation is equivalent to . - Clique Given a graph the following equations encode that is a indicator vector of a -clique: , , for all .

The SOS algorithm is designed to solve the polynomial optimization problem. As we can see from these examples, the full polynomial optimization problem is NP hard, and hence we can’t expect SOS (or any other algorithm) to efficiently solve it on every instance.

**Exercise 1: ** prove that this is the case even if all polynomials are quadratic, i.e. of degree at most .)

Understanding how close the SOS algorithm gets in particular cases is the main technical challenge we will be dealing with.

These examples also show that polynomial optimziation is an extremely versatile formalism, and many other computational problems (including SAT and CLIQUE) can be directly and easily phrased as instances of it. Henceforth we will ignore the question of *how* to formalize a problem as a polynomial optimization, and either assume the problem is already given in this form, or use the simplest most straightforward translation if it isn’t. While there are examples where choosing between different natural formulations could make a difference in complexity, this is not the case (to my knowledge) in the questions we will look at.

**Note:** We can always assume without loss of generality that all our equations are *equalities*, since we can always replace the equation by where is some new auxiliary variable.

Also, we sometimes will ask the question of minimizing (or maximizing) a polynomial subject to satisfying equations , which can be captured by looking for the largest such that is satisfiable.

The Sum of Squares algorithm is an algorithm to solve the polynomial optimization problem. Given that it is NP hard, the SOS algorithm cannot run in polynomial time on all instances. The main focus of this course is trying to understand in which cases the SOS algorithm takes a small (say polynomial or quasipolynomial) amount of time, in which cases it takes a large (say exponential) amount. An equivalent form of this question (which is the one we’ll mostly use) is that, for some small (e.g. a constant or logarithmic) we want to understand in which cases the “-capped” version of SOS succeeds to solve the problem and in which cases it doesn’t, where the “-capped” version of the SOS algorithm halts in time regardless of whether or not it solved the problem.

In fact, we will see that for every value of , the SOS of squares always returns some type of a meaningful output. The main technical challenge is to understand whether that output can be transformed to an exact or approximate solution for the polynomial optimization problem.

Definition 2 (Sum of Squares – informal definition)

The SOS algorithm gets a parameter and a set of equations , runs in time and outputs either:

- An object we will call a “degree- pseudo solution” (or more accurately a degree-
pseudo-distributionover solutions).\medskipor- A proof that a solution doesn’t exist.

We will later make this more precise: what is exactly a degree- pseudo solution, what is exactly the form of the proof, and how does the algorithm work.

**History.** *[Note: this is mostly from memory and not from the primary sources, so doublecheck this before quoting elsewhere. The introduction of this paper of O’Donnell and Zhou is a good starting point for the hisroty.]* The SOS algorithm has its roots in questions raised in the late century by Minkowski and Hilbert of whether any non-negative polynomial can be represented as a sum of squares of other polynomials. Hilbert realized that except for some special cases (most notably univariate polynomials and quadratic polynomials), the answer is negative and that there is an example (which he constructed by non constructive means) of non-negative polynomial that cannot be represented in this way. It was only in the 1960’s that Motzkin gave a very concrete example of such a polynomial

(1)

In his famous 2000 address, Hilbert asked as his 17th problem whether any polynomial can be represented as a sum of squares of *rational* functions. (For example, Motzkin’s polynomial (1) can be shown to be the sum of squares of (I think) four rational functions of denominator and numerator degree at most ). This was answer positively by Artin in 1927. His approach can be summarized as, given a hypothetical polynomial that cannot be represented in this form, to use the fact that the rational functions are a field to extend the reals into a “pseudo-real” field on which there would actually be an element such that , and then use a “transfer principle” to show that there is an actual real such that . (This description is not meant to be understandable but to make you curious enough to look it up..) Later in the 60’s and 70’s Krivine and Stengle extended this result to show that any unsatisfiable system of polynomial equations can be certified to be unsatisfiable via a Sum of Squares proof, a result known as the Positivstallensatz.

In the late 90’s / early 2000’s, there were two separate efforts on getting quantitative or algorithmic versions of this result. On one hand Grigoriev and Vorobjov asked the question of *how large* the degree of an SOS proof needs to be, and in particular Grigoriev proved several lower bounds on this degree for some interesting polynomials. On the other hand Parrilo and Lasserre (independently) came up with hierarchies of algorithms for polynomial optimization based on the Positivstallensatz using semidefinite programming. (Something along those lines was also described by Naum Shor in a 1987 Russian paper, and mentioned by Nesterov as well.)

It took some time for people to realize the connection between all these works, and in particular the relation between Grigoriev-Vorbjov’s work and the works from the optimization literature took some time to be discovered, and even 10 years after, it was still the case that some results of Grigoriev were rediscovered and reproven in the Lasserre language.

**Applications of SOS** SOS has applications to: equilibrium analysis of dynamics and control (robotics, flight controls, …), robust and stochastic optimization, statistics and machine learning, continuous games, software verification, filter design, quantum computation and information, automated theorem proving, packing problems, etc…

** Remark: the TCS vs Mathematical Programming view of SOS **

While the SOS algorithm is intensively studied in several communities, there are some differences in emphasizes between the different aspects. While I am not an expert on all SOS works, my impression is that the main characteristics of the TCS viewpoint as opposed to others are:

- In the TCS world, we typically think of the number of variables as large and tending to infinity (as it corresponds to our input size),

and the degree of the SOS algorithm as being relatively small— a constant or logarithmic.

In contrast, in the optimization and control world, the number of variables can often be very small (e.g. around ten or so, maybe even smaller) and hance may be large compared to it.Note that since both time and space complexity of the general SOS algorithm scale roughly like , even and would take something like a petabyte of memory (in practice, though we didn’t try to optimize too much, David Steurer and I had a hard time executing a program with and on a Cornell cluster).

This may justify the optimization/control view of keeping small, although if we show that SOS yields a polynomial-time algorithm for a particular problem, then we can hope that we would be able to then optimize further and obtain an algorithm that doesn’t require a full-fledged SOS solver. - Typically in TCS our inputs are discrete and the polynomials are simple, with integer coefficients etc. Often we have constraints such as that restrict attention to the Boolean cube, and so we are less concerned with issues of numerical accuracy, boundedness, etc..
- Traditionally people have been concerned with
*exact convergence*of the SOS algorithm—- when does it yield an exact solution to the optimization problem. This often precludes from being much smaller than . In contrast as TCS’ers we would often want to understand*approximate convergence*— when does the algorithm yield an “approximate” solution (in some problem-dependent sense).Since the output of the algorithm in this case is not actually in the form of a solution to the equations, this raises the question of a obtaining*rounding*algorithms, which are procedures to translate the output of the algorithm to an approximate solution.

**4. Several views of the SOS algorithm **

We now describe the SOS algorithm more formally. For simplicity, we consider the case that the set only consists of equalities (which is without loss of generality as we mentioned before). When convenient we will assume all equalities are homogenous polynomials of degree . (This can be always be arranged by multiplying the constraints.) You can restrict attention to — this will capture all of the main issues of the general case.

** 4.1. SOS Algorithm: convex optimization view **

We start by presenting one view of the SOS algorithm, which technically might be the simplest, though perhaps at first not conceptually insightful.

Definition 3Let denote the set of -variate polynomials of degree at most . Note that

this is a linear subspace of dimension roughly .We will sometimes also write this as where we want to emphasize that these polynomials take the formal input .

Definition 4Let be a set of polynomial equations where for all .

Let be some integer multiple of . Thedegree- SOS algorithmeither outputs \texttt{‘fail’} or a

bilinear operator satisfying:

- Normalization: (where is simply the polynomial ).
- Symmetry: If satisfy then .
- Non-nonnegativity (positive semi definiteness): For every , .
- Feasibility: For every , , , .

**Exercise 2: ** Show that if the symmetry and feasibility constraints hold for monomials they hold for all polynomials as well.

**Exercise 3: ** Show that the set of ‘s satisfying the conditions above is convex and has an efficient separation oracle.

Indeed, such an can be represented as an PSD matrix satisfying some linear constraints. (Can you see why?) Thus by semidefinite programming finding such an if it exists can be done in time (throughout this seminar we ignore issues of precision etc..).

The question is why does this have anything to do with solving our equations, and one answer is given by the following lemma:

Lemma 5Suppose that is satisfiable. Then there exists an operator satisfying the conditions above.

*Proof:* Let be a solution for the equations and let . Note

that clearly satisfies all the conditions.

Since the set of such operators is convex, for every distribution over solutions of , the operator also satisfies the conditions. As grows, eventually the only operators that satisfy the condition will be of this form.

For this reason we will call a *degree- pseudo-expectation operator*. For a polynomial of degree at most , we define as follows: we write where each is a *monomial* of degree at most , and then decompose where the degree of and is at most and then define . We will often use the suggestive notation for .

**Exercise 4: ** Show that is well defined and does not depend on the decomposition.

To get some intuition, we now focus attention about the special case that our goal is to maximize some polynomial over over Boolean cube (i.e., the set of ‘s satisfying .)

This case is not so special in the sense that (a) it captures much of what we want to do in TCS and (b) the intuition it yields largely applies to more general settings.

Recall that we said that for every distribution over ‘s satisfying the constraints, we can get an operator as above by looking at . We now show that in some sense *every* operator has this form, if, in a manner related to and very reminiscent of quantum information theory, we allow the probabilities to go negative.

Definition 6A function is adegree- pseudo-distributionif it

satisfies:

- Normalization: .
- Restricted non-negativity: For every polynomial of degree at most , ,

where we define as .

Note that if was actually pointwise non-negative then it would be an actual distribution on the cube. Thus an actual distribution over the cube is always a pseudo distribution.

**Exercise 5: ** Show that a degree pseudo-distribution is an actual distribution.

**Exercise 6: ** Show that if is a degree pseudo-distribution, then there exists a degree- pseudo-distribution such that for every polynomial and that is a degree polynomial in the variables

of . (Hence for our purposes we can always represent such pseudo-distributions with numbers.)

**Exercise 7: ** Show that for every polynomial of degree at most , there exists a degree pseudo-distribution on the cube satisfying if and only if there exists a degree pseudo-expectation operator as above satisfying such that .

Therefore, we can say that the degree- SOS algorithm outputs either a degree- pseudo-distribution over the solutions to or **‘fail’** and only outputs the latter if the former doesn’t exist. In particular if it outputs **‘fail’** then there isn’t any *actual* distribution over the solutions, and so the fact that the algorithm outputs **‘fail’** is a *proof* that the original equations are unsatisfiable. We will see that by convex duality, the algorithm actually outputs an

explicit proof of this fact that has a natural interpretation.

**Exercise 8: ** (optional– for people who have heard about the Sherali-Adams linear programming hierarchy) Show that the variant of pseudo-distributions where we replace the condition that expectation is non-negative on all squares of degree polynomials with the condition that it should be non-negative on all non-negative functions that depend on at most variables can be optimized over using linear programming and is equivalent to rounds of the Sherali-Adams LP.

**Are all pseudo-distributions distributions?**

For starters, we can always find a distribution matching all the quadratic moments.

Lemma 7 (Gaussian Sampling Lemma)Let be a degree- pseudo-expectation operator for .

Then there exists a distribution over such that for every polynomial

of degree at most , . Moreover, is a (correlated) Gaussian distribution.

Note that even if comes from a pseudo-distribution over the cube, the output of will be real numbers that although satisfying , will be in .

Unfortunately, we don’t have an analogous result for higher moments:

**Exercise 9: ** Prove that if there was an analog of the Gaussian Sampling Lemma for every polynomial of degree at most then P=NP. (Hint: show that you could solve 3SAT, can you improve the degree to ? maybe ?)

Unfortunately, this will not be our way to get fame and fortune:

**Exercise 10: ** Prove that there exists a degree pseudo-distribution over the cube such that there does not exist any actual distribution that matches its expectation on all polynomials of degree at most . (Can you improve this to ?)

**5. Sum of Square Proofs **

As we said, when the SOS algorithm outputs \texttt{‘fail’} this can be interpreted as a proof that the system of equations is unsatisfiable. However, it turns out this proof actually has a special form that is known as an SOS proof or *positivstenelsatz*.

An SOS proof uses the following rules of inference

They should be interpreted as follows. If you know that a set of conditions is satisfied on some set , then any conditions derived by the rules above would on that set as well. (Note that we only mentioned inequalities above, but of course is equivalent to the conditions .)

Definition 8

Let be a set of equations. We say that implies via a degree- SOS proof,

denoted , if can be inferred from the constraints in via

a sequence of applications of the rules above where all intermediate polynomials are of syntactic degree .The

syntactic degreeof the polynomials in is their degree, while the syntactic degree of

(resp. ) is equal to the maximum (resp. the sum ) of the syntactic degrees of .

That is, the syntactic degree tracks the degrees of the intermediate polynomials without accounting for

cancellations.

**(Note:** If we kept track of the actual degree instead of the syntactic degree we get a much stronger proof system for which we don’t have a static equivalent form, and can prove some things that the static system cannot. See the paper of Grigoriev, Hirsch and Pasechnik for discussion of this other system.**)**

Definition 9

Let be a set of polynomial equalities.

We say that has adegree- SOS refutationif .

It turns out that a degree- refutation can always be put in a particular compact *static* form.

**Exercise 11: ** For every , prove that (where all ‘s are of degree )

has a *degree- SOS refutation* if and only if there exists

of degree at most and of degree at most such that

where , i.e. it is a *sum of squares*.

(It’s OK if you lose a bit in each direction, i.e., in the if direction it could be that while in the only if direction it could be that .)

**Exercise 12: ** Show that we can take to be at most .

**Exercise 13}: ** Show that the set satisfying (2) is a convex set with an efficient separation oracle.

Positivstellensatz (Stengle 64, Krivine 74):For every unsatisfiable system of equalities there exists a finite s.t. has a degree proof of unsatisfiability.

**Exercise 14: ** Prove P-satz for systems that include the constraint for all . In this case, show that needs to be at most (where is the number of variables). As a corollary, we get that the SOS algorithm does not need more than time to solve polynomial equations on Boolean variables. (Not very impressive bound, but good to know.

In all TCS applications I am aware of, it’s easy to show that the SOS algorithm will solve the problem in exponential time. )

**Exercise 15: ** Show that if there exists a degree- SOS proof that is unsatisfiable then there is no

degree- pseudo-distribution consistent with .

SOS Theorem (Shor, Nesterov, Parrilo, Lasserre)} Under some mild conditions (see Theorem 2.7 in my survey with Steurer),

there is an time algorithm that given a set of polynomial equalities either outputs:

- A degree- pseudo-distribution consistent with
or- A degree- SOS proof that is unsatisfiable.

**The different views of pseudo distributions:** The notion of pseudo-distribution is somewhat counter-intuitive and takes a bit of time to get used to. It can be viewed from the following perspectives:

- Pseudo-distributions is simply a fancy name for a PSD matrix satisfying some linear constraints, which is the dual object to SOS proofs.
- SOS proofs of unbounded degree is a sound and complete proof system in the sense that they can prove

any true fact (phrased as polynomial equations) about actual distributions over .SOS proofs of degree is a sound and not complete proof system for actual distributions, but it is a (sound and) complete system for degree pseudo-distributions, in the sense that any true fact that holds not merely for actual distributions but also for degree pseudo-distributions has a degree SOS proof. - In statistical learning problems (and economics) we often capture our knowledge (or lack thereof) by a distribution.

If an unknown quantity is selected and we are given the observations about it, we often describe our knowledge of

by a the distribution .

In computational problems, often the observations completely determine the value , but pseudo-distribution

can still capture our “computational knowledge”. - The proof system view can also be considered as a way to capture our limited computational abilities.

In the example above, a computationally unbounded observer can deduce from the observations all the true facts it implies

and hence completely determine . One way to capture the limits of a computationally bounded observer is that it can only deduce facts using a more limited, sound but not complete, proof system.

**Lessons from History** It took about 80 years from the time Hilbert showed that polynomials that are not SOS exist non-constructively until Motzkin came up with an explicit example, and even that example has a low degree SOS proof of positivity. One lesson from that is the following:

“Theorem”:If a polynomial is non-negative and “natural” (i.e., constructed by methods known to Hilbert—

not including probabilistic method), then there should be a low degree SOS proof for this fact.

Corollary (Marley, 1980):If you analyze the performance of an SOS based algorithm pretending pseudo-distributions

are actual distributions, then unless you used Chernoff+union bound type arguments, then every little thing gonna be alright.

We will use Marley’s corollary extensively in analyzing SOS algorithms. That is, we will pretend that the pseudo-distributions are actual distributions, and then cross our fingers and hope that our analysis will carry over when the algorithm actually works with pseudo-distributions. Thus one can think of pseudo-distributions as a *“non type safe”* notation that perhaps is not always sound, but makes it easier to phrase and prove theorems that we might not be able to do otherwise.

There is a recurring theme in mathematics of “power from weakness”. For example, we can often derandomize certain algorithms by observing that they fall in some restricted complexity classes and hence can be fooled by certain pseudorandom generator. Another example, perhaps closer to ours, is that even though the original way people defined calculus with “infitesimal” amounts were based on false permises, still much of the results they deduced were correct. One way to explain this is that they used a weak proof system that cannot prove all true facts about the real numbers, and in particular cannot detect if the real numbers are replaced with an object that does have such an “infitesimal” quantity added to it. In a similar way, if you analyze an algorithm using a weak proof system (e.g. one that is captured by a small degree SOS proof), then the analysis will still hold even if we replaced actual distributions with a pseudo-distribution of sufficiently large degree.

]]>