I am teaching a seminar series at MIT with the title above, and thought I would post the introduction to the course here. For the complete notes on the first lecture (and notes of future lectures), please see the course webpage, where you can also sign up for the mailing list to get some future updates. While the SOS algorithm is widely studied and used for a great many applications (see for example the courses of Parrilo and Laurent and the book of Lasserre) , this seminar will offer a different perspective of theoretical computer science. One tongue-in-cheek tagline for this course can be

Rescuing the Sum-of-Squares algorithm from its obscurity as an algorithm that keeps planes up in the sky into a way to refute computational complexity conjectures.

Consider the following questions:

- Do we need a different algorithm to solve every computational problem, or can a single algorithm give the best performance for a large class of problems?
- In statistical physics and other areas, many people believe in the existence of a
*computational threshold*effect, where a small change in the parameters of a computational problems seems to lead to a huge change in its computational complexity. Can we give rigorous evidence for this intuition? - In machine learning there often seem to be tradeoffs between sample complexity, error probability, and computation time. Is there a way to map the curve of this tradeoff?
- Suppose you are given a 3SAT formula with a unique (but unknown) satisfying assignment . Is there a way to make sense of statements such as “The probability that is ” or “The entropy of is “?
- (even though of course information theoretically is completely determined by , and hence that probability is either or and has zero entropy).
- Is Khot’s Unique Games Conjecture true?

If you learn the answers to these questions by the end of this seminar series, then I hope you’ll explain them to me, since I definitely don’t know them. However we will see that, despite these questions a-priori having nothing to do with Sums of Squares, that the SOS algorithm can yield a powerful lens to shed light on some of those questions, and perhaps be a step towards providing some of their answers.

Theoretical computer science studies many computational models for different goals. There are some models, such as bounded-depth (i.e. ) circuits, that we can prove unconditional lower bounds on, but do not aim to capture all relevant algorithmic techniques for a given problem. (For example, we don’t view the results of Furst-Sax-Sipser and Håstad as evidence that computing the parity of bits is a hard problem.) Other models, such as bilinear circuits for matrix multiplication, are believed to be strong enough to capture all known algorithmic techniques for some problems, but then we often can’t prove lower bounds on them.

The *Sum of Squares (SOS)* algorithm (discovered independently by researchers from different communities including Shor, Parrilo, Nesterov and Lasserre) can be thought of as another example of a concrete computational model. On one hand, it is sufficiently weak for us to know at least some unconditional lower bounds for it. In fact, there is a sense that it is weaker than , since for a given problem and input length, SOS is a *single algorithm* (as opposed to an exponential-sized family of circuits). Despite this fact, proving lower bounds for SOS is by no means trivial, even for a single distribution or instances (such as a random graph) or even a single instance. On the other hand, while this deserves more investigation, it does seem that for many interesting problems, SOS does encapsulate all the algorithmic techniques we are aware of, and that there is some hope that SOS is an ** optimal algorithm** for some interesting family of problems, in the sense that no other algorithm with similar efficiency can beat SOS’s performance on these problems. (I should note that we currently have more in the way of intuitions than hard evidence for this, though the unique games conjecture, if true, would imply that SOS is an optimal approximation algorithm for every constraint satisfaction problem and many other problems as well; that said, the SOS algorithm also yields the strongest evidence to date that the UGC may be false…)

The possibility of the existence of such an *optimal algorithm* is very exciting. Even if at the moment we can’t hope to prove its optimality unconditionally, this means that we can (modulo some conjectures) reduce analyzing the difficulty of a problem to analyzing a single algorithm, and this has several important implications. For starters, it reduces the need for creativity in designing the algorithm, making it required only for the algorithm’s *analysis*. In some sense, much of the progress in science can be described as attempting to automate and make routine and even boring what was once challenging. Just as we today view as commonplace calculations that past geniuses such as Euler and Gauss spent much of their time on, it is possible that in the future much of algorithm design, which now requires an amazing amount of creativity, would be systematized and made routine. Another application of optimality is automating *hardness results*— if we prove the optimal algorithm *can’t* solve a problem X then that means that X can’t be solved by any efficient algorithm.

Beyond than just systematizing what we already can do, optimal algorithms could yield qualitative new insights on algorithms and complexity. For example, in many problems arising in statistical physics and machine learning, researchers believe that there exist *computational phase transitions*— where a small change in the parameter of a problems causes a huge jump in its computational complexity. Understanding this phase transitions is of great interest for both researchers in these areas and theoretical computer scientists. The problem is that these problems involve *random inputs* (i.e., *average case complexity*) and so, based on current state of art, we have no way of proving the existence of such phase transitions based on assumptions such as . In some cases, such as the planted clique problem, the problem has been so well studied that the existence of a computational phase transition had been proposed as a conjecture in its own right, but we don’t know of good ways to reduce such reductions to one another, and we clearly don’t want to have as many conjectures as there are problems. If we assume that an algorithm is optimal for a class of problems, then we can prove a computational phase transition by analyzing the running time of this algorithm as a function of the parameters. While by no means trivial, this is a tractable approach to understanding this question and getting very precise estimates as to the location of the threshold where the phase transition occurs. (Note that in some sense the existence of a computational phase transition implies the existence of an optimal algorithm, since in particular it means that there is a single algorithm such that beating ‘s performance by a little bit requires an algorithm taking much more resources.)

Perhaps the most exciting thing is that an optimal algorithm gives us a *new understanding* of just what is it about a problem that makes it easy or hard, and a new way to look at efficient computation. I don’t find explanations such as “Problem A is easy because it has an algorithm” or “Problem B is hard because it has a reduction from SAT” very satisfying. I’d rather get an explanation such as “Problem A is easy because it has property P” and ”Problem B is hard because it doesn’t have P” where P is some meaningful property (e.g., being convex, supporting some polymorphisms, etc..) such that every problem (in some domain) with P is easy and every problem without it is hard. For that, we would want an algorithm that will solve all problems in P and a proof (or other evidence) that it is optimal. Such understanding of computation could bear other fruits as well. For example, as we will see in this seminar series, if the SOS algorithm is optimal in a certain domain, then we can use this to build a theory of “computational Bayesian reasoning” that can capture the “computational beliefs” of a bounded-time agent about a certain quantity, just as traditional Bayesian reasoning captures the beliefs of an unbounded-time agent about quantities on which it is given partial information.

I should note that while much of this course is very specific to the SOS algorithms, not all of it is, and it is possible that even if the SOS algorithm is superseded by another one, some of the ideas and tools we develop will still be useful. Also, note that I have deliberately ignored the question of *what* family of problems would the SOS be optimal for. This is clearly a crucial issue— every computational model (even ) is optimal for *some* problems, and every model falling short of general polynomial-time Turing machines would not be optimal for *all* problems. It definitely seems that some algebraic problems, such as integer factoring, have very special structure that makes it hard to conjecture that any generic algorithm (and definitely not the SOS algorithm) would be optimal for them. (See also my previous blog post on this topic.) The reason I don’t discuss this issue is that we still don’t have a good answer for it, and one of the research goals in this area is to understand what should be the right *conjecture* about optimality of SOS. However, we do have some partial evidence and intuition, including those arising from the SOS algorithm’s complex (and not yet fully determined) relation to Khot’s Unique Games Conjecture, that leads us to believe that SOS could be an optimal algorithm for a non-trivial and interesting class of problems.

In this course we will see:

- A description of the SOS algorithm from different viewpoints— the traditional semidefinite programming/convex optimization view, as well as the proof system view, and the “pseudo-distribution” view.
- Discussion of positive results (aka “upper bounds”) using the SOS algorithms to solve graph problems such as sparsest cut, and problems in machine learning.
- Discussion of known negative results (aka “lower bounds” / “integrality gaps”) for this algorithm.
- Discussion of the interesting (and not yet fully understood) relation of the SOS algorithm to Khot’s
*Unique Games Conjecture*(UGC). On one hand, the UGC implies that the SOS algorithm is*optimal*for a large class of problems. On the other hand, the SOS algorithm is currently the main candidate to refute the UGC.

The SOS algorithm is an algorithm for solving a computational problem. Let us now define what this problem is:

Definition 1Apolynomial equationis an equation of the form (in which case it is called an

inequality) or an equation of the form (in which case it is called anequality) where is a multivariate polynomial

mapping to . The equation (resp. ) issatisfiedby if (resp. ).A set of polynomial equations is

satisfiableif there exists an that satisfies all equations in .The

polynomial optimizationproblem is to output, given a set of polynomial equations as input, either an satisfying all equations in or a proof that is unsatisfiable.

(Note: throughout this seminar we will ignore all issues of numerical accuracy— assume the polynomials always have rational coefficients with bounded numerator and denominator, and all equalities/inequalities can be satisfied up to some small error .)

Here are some examples for polynomial optimization problems:

- Linear programming If all the polynomials are
*linear*then this is of course linear programming that can be done in polynomial time. - Least squares If the equations consist of a single quadratic then this is the least squares algorithm. Similarly, one can capture computing eigenvalues by two quadratics.
- 3SAT Can encode 3SAT formula as degree 3 polynomial equations: the equation is equivalent to .

The equation is equivalent to . - Clique Given a graph the following equations encode that is a indicator vector of a -clique: , , for all .

The SOS algorithm is designed to solve the polynomial optimization problem. As we can see from these examples, the full polynomial optimization problem is NP hard, and hence we can’t expect SOS (or any other algorithm) to efficiently solve it on every instance.

**Exercise 1: ** prove that this is the case even if all polynomials are quadratic, i.e. of degree at most .)

Understanding how close the SOS algorithm gets in particular cases is the main technical challenge we will be dealing with.

These examples also show that polynomial optimziation is an extremely versatile formalism, and many other computational problems (including SAT and CLIQUE) can be directly and easily phrased as instances of it. Henceforth we will ignore the question of *how* to formalize a problem as a polynomial optimization, and either assume the problem is already given in this form, or use the simplest most straightforward translation if it isn’t. While there are examples where choosing between different natural formulations could make a difference in complexity, this is not the case (to my knowledge) in the questions we will look at.

**Note:** We can always assume without loss of generality that all our equations are *equalities*, since we can always replace the equation by where is some new auxiliary variable.

Also, we sometimes will ask the question of minimizing (or maximizing) a polynomial subject to satisfying equations , which can be captured by looking for the largest such that is satisfiable.

The Sum of Squares algorithm is an algorithm to solve the polynomial optimization problem. Given that it is NP hard, the SOS algorithm cannot run in polynomial time on all instances. The main focus of this course is trying to understand in which cases the SOS algorithm takes a small (say polynomial or quasipolynomial) amount of time, in which cases it takes a large (say exponential) amount. An equivalent form of this question (which is the one we’ll mostly use) is that, for some small (e.g. a constant or logarithmic) we want to understand in which cases the “-capped” version of SOS succeeds to solve the problem and in which cases it doesn’t, where the “-capped” version of the SOS algorithm halts in time regardless of whether or not it solved the problem.

In fact, we will see that for every value of , the SOS of squares always returns some type of a meaningful output. The main technical challenge is to understand whether that output can be transformed to an exact or approximate solution for the polynomial optimization problem.

Definition 2 (Sum of Squares – informal definition)

The SOS algorithm gets a parameter and a set of equations , runs in time and outputs either:

- An object we will call a “degree- pseudo solution” (or more accurately a degree-
pseudo-distributionover solutions).\medskipor- A proof that a solution doesn’t exist.

We will later make this more precise: what is exactly a degree- pseudo solution, what is exactly the form of the proof, and how does the algorithm work.

**History.** *[Note: this is mostly from memory and not from the primary sources, so doublecheck this before quoting elsewhere. The introduction of this paper of O'Donnell and Zhou is a good starting point for the hisroty.]* The SOS algorithm has its roots in questions raised in the late century by Minkowski and Hilbert of whether any non-negative polynomial can be represented as a sum of squares of other polynomials. Hilbert realized that except for some special cases (most notably univariate polynomials and quadratic polynomials), the answer is negative and that there is an example (which he constructed by non constructive means) of non-negative polynomial that cannot be represented in this way. It was only in the 1960’s that Motzkin gave a very concrete example of such a polynomial

(1)

In his famous 2000 address, Hilbert asked as his 17th problem whether any polynomial can be represented as a sum of squares of *rational* functions. (For example, Motzkin’s polynomial (1) can be shown to be the sum of squares of (I think) four rational functions of denominator and numerator degree at most ). This was answer positively by Artin in 1927. His approach can be summarized as, given a hypothetical polynomial that cannot be represented in this form, to use the fact that the rational functions are a field to extend the reals into a “pseudo-real” field on which there would actually be an element such that , and then use a “transfer principle” to show that there is an actual real such that . (This description is not meant to be understandable but to make you curious enough to look it up..) Later in the 60’s and 70’s Krivine and Stengle extended this result to show that any unsatisfiable system of polynomial equations can be certified to be unsatisfiable via a Sum of Squares proof, a result known as the Positivstallensatz.

In the late 90’s / early 2000’s, there were two separate efforts on getting quantitative or algorithmic versions of this result. On one hand Grigoriev and Vorobjov asked the question of *how large* the degree of an SOS proof needs to be, and in particular Grigoriev proved several lower bounds on this degree for some interesting polynomials. On the other hand Parrilo and Lasserre (independently) came up with hierarchies of algorithms for polynomial optimization based on the Positivstallensatz using semidefinite programming. (Something along those lines was also described by Naum Shor in a 1987 Russian paper, and mentioned by Nesterov as well.)

It took some time for people to realize the connection between all these works, and in particular the relation between Grigoriev-Vorbjov’s work and the works from the optimization literature took some time to be discovered, and even 10 years after, it was still the case that some results of Grigoriev were rediscovered and reproven in the Lasserre language.

**Applications of SOS** SOS has applications to: equilibrium analysis of dynamics and control (robotics, flight controls, …), robust and stochastic optimization, statistics and machine learning, continuous games, software verification, filter design, quantum computation and information, automated theorem proving, packing problems, etc…

** Remark: the TCS vs Mathematical Programming view of SOS **

While the SOS algorithm is intensively studied in several communities, there are some differences in emphasizes between the different aspects. While I am not an expert on all SOS works, my impression is that the main characteristics of the TCS viewpoint as opposed to others are:

- In the TCS world, we typically think of the number of variables as large and tending to infinity (as it corresponds to our input size),

and the degree of the SOS algorithm as being relatively small— a constant or logarithmic.

In contrast, in the optimization and control world, the number of variables can often be very small (e.g. around ten or so, maybe even smaller) and hance may be large compared to it.Note that since both time and space complexity of the general SOS algorithm scale roughly like , even and would take something like a petabyte of memory (in practice, though we didn’t try to optimize too much, David Steurer and I had a hard time executing a program with and on a Cornell cluster).

This may justify the optimization/control view of keeping small, although if we show that SOS yields a polynomial-time algorithm for a particular problem, then we can hope that we would be able to then optimize further and obtain an algorithm that doesn’t require a full-fledged SOS solver. - Typically in TCS our inputs are discrete and the polynomials are simple, with integer coefficients etc. Often we have constraints such as that restrict attention to the Boolean cube, and so we are less concerned with issues of numerical accuracy, boundedness, etc..
- Traditionally people have been concerned with
*exact convergence*of the SOS algorithm—- when does it yield an exact solution to the optimization problem. This often precludes from being much smaller than . In contrast as TCS’ers we would often want to understand*approximate convergence*— when does the algorithm yield an “approximate” solution (in some problem-dependent sense).Since the output of the algorithm in this case is not actually in the form of a solution to the equations, this raises the question of a obtaining*rounding*algorithms, which are procedures to translate the output of the algorithm to an approximate solution.

**4. Several views of the SOS algorithm **

We now describe the SOS algorithm more formally. For simplicity, we consider the case that the set only consists of equalities (which is without loss of generality as we mentioned before). When convenient we will assume all equalities are homogenous polynomials of degree . (This can be always be arranged by multiplying the constraints.) You can restrict attention to — this will capture all of the main issues of the general case.

** 4.1. SOS Algorithm: convex optimization view **

We start by presenting one view of the SOS algorithm, which technically might be the simplest, though perhaps at first not conceptually insightful.

Definition 3Let denote the set of -variate polynomials of degree at most . Note that

this is a linear subspace of dimension roughly .We will sometimes also write this as where we want to emphasize that these polynomials take the formal input .

Definition 4Let be a set of polynomial equations where for all .

Let be some integer multiple of . Thedegree- SOS algorithmeither outputs \texttt{‘fail’} or a

bilinear operator satisfying:

- Normalization: (where is simply the polynomial ).
- Symmetry: If satisfy then .
- Non-nonnegativity (positive semi definiteness): For every , .
- Feasibility: For every , , , .

**Exercise 2: ** Show that if the symmetry and feasibility constraints hold for monomials they hold for all polynomials as well.

**Exercise 3: ** Show that the set of ‘s satisfying the conditions above is convex and has an efficient separation oracle.

Indeed, such an can be represented as an PSD matrix satisfying some linear constraints. (Can you see why?) Thus by semidefinite programming finding such an if it exists can be done in time (throughout this seminar we ignore issues of precision etc..).

The question is why does this have anything to do with solving our equations, and one answer is given by the following lemma:

Lemma 5Suppose that is satisfiable. Then there exists an operator satisfying the conditions above.

*Proof:* Let be a solution for the equations and let . Note

that clearly satisfies all the conditions.

Since the set of such operators is convex, for every distribution over solutions of , the operator also satisfies the conditions. As grows, eventually the only operators that satisfy the condition will be of this form.

For this reason we will call a *degree- pseudo-expectation operator*. For a polynomial of degree at most , we define as follows: we write where each is a *monomial* of degree at most , and then decompose where the degree of and is at most and then define . We will often use the suggestive notation for .

**Exercise 4: ** Show that is well defined and does not depend on the decomposition.

To get some intuition, we now focus attention about the special case that our goal is to maximize some polynomial over over Boolean cube (i.e., the set of ‘s satisfying .)

This case is not so special in the sense that (a) it captures much of what we want to do in TCS and (b) the intuition it yields largely applies to more general settings.

Recall that we said that for every distribution over ‘s satisfying the constraints, we can get an operator as above by looking at . We now show that in some sense *every* operator has this form, if, in a manner related to and very reminiscent of quantum information theory, we allow the probabilities to go negative.

Definition 6A function is adegree- pseudo-distributionif it

satisfies:

- Normalization: .
- Restricted non-negativity: For every polynomial of degree at most , ,

where we define as .

Note that if was actually pointwise non-negative then it would be an actual distribution on the cube. Thus an actual distribution over the cube is always a pseudo distribution.

**Exercise 5: ** Show that a degree pseudo-distribution is an actual distribution.

**Exercise 6: ** Show that if is a degree pseudo-distribution, then there exists a degree- pseudo-distribution such that for every polynomial and that is a degree polynomial in the variables

of . (Hence for our purposes we can always represent such pseudo-distributions with numbers.)

**Exercise 7: ** Show that for every polynomial of degree at most , there exists a degree pseudo-distribution on the cube satisfying if and only if there exists a degree pseudo-expectation operator as above satisfying such that .

Therefore, we can say that the degree- SOS algorithm outputs either a degree- pseudo-distribution over the solutions to or **‘fail’** and only outputs the latter if the former doesn’t exist. In particular if it outputs **‘fail’** then there isn’t any *actual* distribution over the solutions, and so the fact that the algorithm outputs **‘fail’** is a *proof* that the original equations are unsatisfiable. We will see that by convex duality, the algorithm actually outputs an

explicit proof of this fact that has a natural interpretation.

**Exercise 8: ** (optional– for people who have heard about the Sherali-Adams linear programming hierarchy) Show that the variant of pseudo-distributions where we replace the condition that expectation is non-negative on all squares of degree polynomials with the condition that it should be non-negative on all non-negative functions that depend on at most variables can be optimized over using linear programming and is equivalent to rounds of the Sherali-Adams LP.

**Are all pseudo-distributions distributions?**

For starters, we can always find a distribution matching all the quadratic moments.

Lemma 7 (Gaussian Sampling Lemma)Let be a degree- pseudo-expectation operator for .

Then there exists a distribution over such that for every polynomial

of degree at most , . Moreover, is a (correlated) Gaussian distribution.

Note that even if comes from a pseudo-distribution over the cube, the output of will be real numbers that although satisfying , will be in .

Unfortunately, we don’t have an analogous result for higher moments:

**Exercise 9: ** Prove that if there was an analog of the Gaussian Sampling Lemma for every polynomial of degree at most then P=NP. (Hint: show that you could solve 3SAT, can you improve the degree to ? maybe ?)

Unfortunately, this will not be our way to get fame and fortune:

**Exercise 10: ** Prove that there exists a degree pseudo-distribution over the cube such that there does not exist any actual distribution that matches its expectation on all polynomials of degree at most . (Can you improve this to ?)

**5. Sum of Square Proofs **

As we said, when the SOS algorithm outputs \texttt{‘fail’} this can be interpreted as a proof that the system of equations is unsatisfiable. However, it turns out this proof actually has a special form that is known as an SOS proof or *positivstenelsatz*.

An SOS proof uses the following rules of inference

They should be interpreted as follows. If you know that a set of conditions is satisfied on some set , then any conditions derived by the rules above would on that set as well. (Note that we only mentioned inequalities above, but of course is equivalent to the conditions .)

Definition 8

Let be a set of equations. We say that implies via a degree- SOS proof,

denoted , if can be inferred from the constraints in via

a sequence of applications of the rules above where all intermediate polynomials are of syntactic degree .The

syntactic degreeof the polynomials in is their degree, while the syntactic degree of

(resp. ) is equal to the maximum (resp. the sum ) of the syntactic degrees of .

That is, the syntactic degree tracks the degrees of the intermediate polynomials without accounting for

cancellations.

**(Note:** If we kept track of the actual degree instead of the syntactic degree we get a much stronger proof system for which we don’t have a static equivalent form, and can prove some things that the static system cannot. See the paper of Grigoriev, Hirsch and Pasechnik for discussion of this other system.**)**

Definition 9

Let be a set of polynomial equalities.

We say that has adegree- SOS refutationif .

It turns out that a degree- refutation can always be put in a particular compact *static* form.

**Exercise 11: ** For every , prove that (where all ‘s are of degree )

has a *degree- SOS refutation* if and only if there exists

of degree at most and of degree at most such that

where , i.e. it is a *sum of squares*.

(It’s OK if you lose a bit in each direction, i.e., in the if direction it could be that while in the only if direction it could be that .)

**Exercise 12: ** Show that we can take to be at most .

**Exercise 13}: ** Show that the set satisfying (2) is a convex set with an efficient separation oracle.

Positivstellensatz (Stengle 64, Krivine 74):For every unsatisfiable system of equalities there exists a finite s.t. has a degree proof of unsatisfiability.

**Exercise 14: ** Prove P-satz for systems that include the constraint for all . In this case, show that needs to be at most (where is the number of variables). As a corollary, we get that the SOS algorithm does not need more than time to solve polynomial equations on Boolean variables. (Not very impressive bound, but good to know.

In all TCS applications I am aware of, it’s easy to show that the SOS algorithm will solve the problem in exponential time. )

**Exercise 15: ** Show that if there exists a degree- SOS proof that is unsatisfiable then there is no

degree- pseudo-distribution consistent with .

SOS Theorem (Shor, Nesterov, Parrilo, Lasserre)} Under some mild conditions (see Theorem 2.7 in my survey with Steurer),

there is an time algorithm that given a set of polynomial equalities either outputs:

- A degree- pseudo-distribution consistent with
or- A degree- SOS proof that is unsatisfiable.

**The different views of pseudo distributions:** The notion of pseudo-distribution is somewhat counter-intuitive and takes a bit of time to get used to. It can be viewed from the following perspectives:

- Pseudo-distributions is simply a fancy name for a PSD matrix satisfying some linear constraints, which is the dual object to SOS proofs.
- SOS proofs of unbounded degree is a sound and complete proof system in the sense that they can prove

any true fact (phrased as polynomial equations) about actual distributions over .SOS proofs of degree is a sound and not complete proof system for actual distributions, but it is a (sound and) complete system for degree pseudo-distributions, in the sense that any true fact that holds not merely for actual distributions but also for degree pseudo-distributions has a degree SOS proof. - In statistical learning problems (and economics) we often capture our knowledge (or lack thereof) by a distribution.

If an unknown quantity is selected and we are given the observations about it, we often describe our knowledge of

by a the distribution .

In computational problems, often the observations completely determine the value , but pseudo-distribution

can still capture our “computational knowledge”. - The proof system view can also be considered as a way to capture our limited computational abilities.

In the example above, a computationally unbounded observer can deduce from the observations all the true facts it implies

and hence completely determine . One way to capture the limits of a computationally bounded observer is that it can only deduce facts using a more limited, sound but not complete, proof system.

**Lessons from History** It took about 80 years from the time Hilbert showed that polynomials that are not SOS exist non-constructively until Motzkin came up with an explicit example, and even that example has a low degree SOS proof of positivity. One lesson from that is the following:

“Theorem”:If a polynomial is non-negative and “natural” (i.e., constructed by methods known to Hilbert—

not including probabilistic method), then there should be a low degree SOS proof for this fact.

Corollary (Marley, 1980):If you analyze the performance of an SOS based algorithm pretending pseudo-distributions

are actual distributions, then unless you used Chernoff+union bound type arguments, then every little thing gonna be alright.

We will use Marley’s corollary extensively in analyzing SOS algorithms. That is, we will pretend that the pseudo-distributions are actual distributions, and then cross our fingers and hope that our analysis will carry over when the algorithm actually works with pseudo-distributions. Thus one can think of pseudo-distributions as a *“non type safe”* notation that perhaps is not always sound, but makes it easier to phrase and prove theorems that we might not be able to do otherwise.

There is a recurring theme in mathematics of “power from weakness”. For example, we can often derandomize certain algorithms by observing that they fall in some restricted complexity classes and hence can be fooled by certain pseudorandom generator. Another example, perhaps closer to ours, is that even though the original way people defined calculus with “infitesimal” amounts were based on false permises, still much of the results they deduced were correct. One way to explain this is that they used a weak proof system that cannot prove all true facts about the real numbers, and in particular cannot detect if the real numbers are replaced with an object that does have such an “infitesimal” quantity added to it. In a similar way, if you analyze an algorithm using a weak proof system (e.g. one that is captured by a small degree SOS proof), then the analysis will still hold even if we replaced actual distributions with a pseudo-distribution of sufficiently large degree.

]]>

Since the news broke on Thursday, I’ve been searching for a right model to apply to the lab’s sudden demise: Shall we sit shiva? Hold a wake? Eulogize? Stumble through denial, anger, bargaining, and depression towards acceptance? Different cultures deal with a loss in remarkably varied ways. Which one is the most applicable to ours? Reflecting back on the history of the lab that by some accounts goes back more than thirty years and spans several companies, I realized that the right answer had been staring at me all along: the document authored by Naughton and Taylor that beautifully summarized the main principles on which our lab was founded was called “Zen and Art of Research Management” [1]. How very true! One cycle of many reincarnations and rebirths has just completed.

Buddhists process loss in a manner that may look insensitive and heartless. Instead of grieving they celebrate the chance for a new beginning. Even if no one entity may eventually claim to be MSR SVC’s rightful heir, its spirit of cross-area collaboration, mutual respect, commitment to fundamental research and support of technology transfer will live on.

I am truly grateful for the opportunities afforded to me by 11 years with MSR SVC, the most cherished of which is the list of collaborators that includes dear friends, bright Ph.D. students, some of the strongest minds in CS, and all around excellent fellows. Thank you and good luck!

[1] Elaboration of Naughton and Taylor’s principles in a piece by the last lab director Roy Levin can be found here.

]]>

My theory colleagues left me in absolute awe! Being surrounded by the creativity and brilliance of a unique collection of young scientists was such a rush. I initiated Windows on Theory because I thought that this rush must be shared with everyone. I hope that the readers of this blog got a glimpse of the breadth and depth of my theory colleagues. I am confident that they will make many departments and research groups much better in the following months and years. My only regret is every free minute I didn’t spend learning from these wonderful colleagues and friends.

My email for now will be omer.reingold@gmail.com, so drop me a line.

]]>

Congratulations!

]]>

Congratulations to Yin Tat Lee and Aaron Sidford for winning the best paper and the best student paper awards for their paper “Solving Linear Programs in O˜(√rank) Iterations and Faster Algorithms for Maximum Flow“. They made an important advance in the theory of interior point methods by showing that you can actually converge faster, and match the non-constructive iteration-bound of Nesterov and Nemirovsky, if you modify on the fly the path the algorithm is taking. On top of that (and with a lot of extra work) they showed that these ideas can yield faster algorithms for Max-Flow in a broad range of parameters. It’s always nice to see how trying to solve one problem such as Max-Flow can often yield unexpected payoffs in areas that at first sight may seem unrelated (Max-Cut is another great example of this phenomenon) .

Of course, as I and some others mentioned, there are many other great papers in the conference, and the workshop/tutorial day is looking very good too. The schedule is also perhaps a bit saner this time around, with a bit less parallelism, and somewhat longer breaks than usual, so I am hoping to see many of this blog’s readers in Philadelphia in October! (Deadline for early registration and discounted hotel rate is September 22nd.)

Keeping up with the times, FOCS now has a more mobile-friendly website (thanks to Wolfgang Richter that gave me access to the codebase of the SOSP 2013 website) and even a twitter account ( @focs14 ). We might even have an app – more on that later.

]]>

*Guest post by Mark Braverman*

My survey covers recent developments in the area of interactive coding theory. This area has been quite active recently, with at least 4 papers on the topic appearing in the next FOCS. This level of activity means that parts of the survey will probably become obsolete within a few years (in fact, I had to rewrite parts of it when the separation result by Ganor, Kol, and Raz was announced in April. *[See also this newer result that was posted after Mark sent me his text --Boaz]*

The basic premise of interactive coding theory is extending the reach of classical coding and information theory to interactive scenarios. Broadly speaking “coding” encompasses compression (aka noiseless coding), error correction (over both adversarial and randomized channels), and cryptography. The latter does not really fit with the rest of the agenda, since cryptographic protocols have always been interactive.

The interactive version of noiseless coding is communication complexity – and taking the information-theoretic view to it yields information complexity, which behaves as the interactive analogue of Shannon’s entropy. The analogue of Shannon’s Noiseless Coding Theorem holds in the interactive case. To what extent interactive compression is possible (i.e. to what extent the interactive analogue of Huffman Coding exists) is a wide-open problem.

On the noisy side, much progress has been made in the adversarial model, starting with the seminal work of Schulman in the 1990s. Many problems surrounding the interactive analogue of Shannon’s channel capacity, even for simple channels, such as the Binary Symmetric Channel remain open.

For the current state of affairs (surveyed for a Math audience) see my ICM survey which is available here.

]]>

Further details and application instructions can be found at simons.berkeley.edu/fellows-summer2015. General information about the Simons Institute can be found at simons.berkeley.edu, and about the Cryptography program at simons.berkeley.edu/programs/crypto2015.

Deadline for applications: 30 September, 2014.

]]>

For TCS the big news was of course that Subhash Khot won the Nevalinna award for his work on the Unique Games Conjecture. As Omer mentioned, this is a research topic that I am extremely interested in, and so am very happy about this well deserved choice. Subhash also gave a fantastic talk, which I highly recommend. Like many others, I was also excited to witness the first time a female mathematician, Maryam Mirzakhani, is awarded the Fields medal and hope we won’t have to wait too long for the first female Nevalinna medalist.

All the plenary talks were videotaped, and I believe that sooner or later they would be available on this website, so I thought I would mention a few talks that TCS folks might want to look at. Every (plenary or section) talk also had an accompanying survey paper, which again I hope would be available online in the not too far future. (Some people, like many of the TCS folks, already posted the papers on the arxiv/eccc etc.., and I hope we will see some more blog posts about it.)

Two talks that I particularly recommend are Emmanuel Candes’s talk on the “Mathematics of sparsity” and Manjul Bhargava’s talk on “Rational points on elliptic and hyperelliptic curves”.

Candes’s talk was an amazing exposition of the power and importance of algorithms. He showed how efficient algorithms can actually make the difference in treating kids with cancer! Specifically, one of the challenges in taking MRI images is that traditionally they take two minutes to make during which the patient cannot make a single breath. You can imagine that this would be dangerous to nearly impossible to achieve for young children. The crucial difference is made by using a sublinear-samples algorithm (i.e. compressed sensing), which allows to recover the images from much fewer samples, reducing the time to about 15 seconds. Another approach of dealing with this issue is to allow the patient to breath but try to algorithmically correct for this movement. Here what they use [as far as I recall] is a decomposition to a low rank plus a sparse matrix which they achieve via a semidefinite program related to the famous Geomans-Williamson max cut algorithm. Interestingly, the latter question is also related to the well known lower bound question of matrix rigidity, and the parameters they achieve roughly correspond to the best known values for this question– somewhat (extremely) speculatively, one can wonder if perhaps an improved rigidity lower bound would end up being useful for this application..

Hearing Candes’s talk I couldn’t help thinking that some of those advances could perhaps have been made sooner if the TCS community had closer ties to the applied math community, and realized the relevance of concepts such as property testing and tool such as the Geomans-Williamson to these kind of questions. Such missed opportunities are unfortunate for our community and (given the applications) also to society at large, which is another reason you should always try to go to talks in other areas..

Bhargava’s talk just blew me away. I can’t remember when I went to a talk in an area so far from my own and felt that I learned so much. I can’t recommend it enough and of course given the use of elliptic curves in cryptography, its not completely unrelated to TCS. I will not attempt any technical description of the talk [just watch it, or read the accompanying paper] but let me mention a TCS-related theme, which actually seems to appear in the works of some of the other Fields medalists as well.

One example of the “unreasonable effectiveness” of algorithms is that they often capture our notion of “mathematical understanding”. For example, a priori, the fact that the clique problem is NP hard, does not mean that we should not be able to figure out the clique number of a particular graph such as the Cayley graph, but it turns out that this actually a real obstacle to do it. Similarly, in other areas of mathematics, whether it is figuring out the solution of a differential equation, or the number of points on an elliptic curve, a priori the non existence of an algorithm should not preclude us from answering the quesiton, but it often does. (I am deliberately conflating here the notion of non-existence of an algorithm and the notion of the non existence of an *efficient* algorithm; indeed for any finite problem, there is some trivial brute-force algorithm, but the existence of it does not help at all in achieving mathematical understanding.)

While we have a difficult time determining the clique number of any specific graph, we do have many tools to determine the clique number of a *random* graph. Such problems are still by no means trivial: e.g., rigorously determining the precise satisfiability threshold of a random 3SAT is still open. Bhargava tackled the problem of trying to determine the number of rational points on a random elliptic curve. In particular he proved that with some nonzero constant probability this number is infinite, and with some nonzero constant probability the number is zero [at least I think so; perhaps its only guaranteed to be finite]. This can be viewed as progress towards the Birch and Swinnerton-Dyer conjecture, which is one of the Clay math problems.

One interpretation of the “natural proofs” barrier is that to make progress on lower bounds, we need to develop more “non constructive” proof techniques, ideally going beyond those that apply only for “random” objects. Perhaps some of these advanced probabilistic tools can still be used in this effort. Also, there have been some “non constructive” results showing that deterministic objects have a certain “pseudo-random” property even in a setting where we don’t have algorithms to certify that a random object has that property. In particular, Bourgain (see this exposition by Rao) showed that a graph somewhat similar to the Payley graph has clique size at most even though we still don’t have an algorithm for the planted clique problem that can certify that a random graph has clique number .

Two other plenary talks that I liked at ICM were János Kollár’s talk on “The structure of algebraic varieties” and James Arthur’s talk on “L-functions and automorphic representations”. I can’t say I understood much of the latter, but I am now slightly less terrified of the Langland’s program (though I still wouldn’t like to meet it in a dark alley..). While I also couldn’t follow much of the former, it still gave me a overview of the effort of classifying algebraic varieties, which I could imagine would have possible TCS applications.

]]>

This is also a good opportunity to recall Boaz’ post on the unique game and other conjectures.

]]>

The 2014 International Congress of Mathematicians (ICM 2014) is coming up in a few days, and (like Boaz said) we have a great collection of speakers in the “Mathematical Aspects of Computer Science” section. As it is the weekend, and I am sure that you are looking for excuses to avoid sunlight and socializing, let me point you to my survey on homomorphic encryption and obfuscation, intriguingly entitled “Computing on the Edge of Chaos: Structure and Randomness in Encrypted Computation“. Also, let this post also serve as a gentle and timely reminder to the other ICM speakers to hype their surveys.

As you read it, I think you will be surprised and delighted by the clarity of the concepts. Sadly (for me), this will not be due to the quality of my exposition (which is notoriously poor). Rather, despite everything you have heard, homomorphic encryption schemes have become embarrassingly simple. A couple of years ago, Boaz and Zvika remarked on this blog that homomorphic encryption schemes “have been simplified enough so that their description can fit, well, in a blog post…”. Since then, they have become even simpler. (As for obfuscation schemes, well, that’s a different story, and my survey keeps to the high-level concepts.) Enjoy!

]]>