This is part 2 of a series on clustering, Gaussian mixtures, and Sum of Squares (SoS) proofs. If you have not read it yet, I recommend starting with Part 1. Also, if you find errors please mention them in the comments (or otherwise get in touch with me) and I will fix them ASAP.
In the last post, we introduced Gaussian mixture models and the clustering problem for Gaussian mixtures. We described identifiability proofs for unsupervised learning problems. Then we set ourselves some goals:
- Design a simple identifiability proof for clustering in Gaussian mixtures, saying that if are (enough) samples from a -dimensional mixture of Gaussians, the ground-truth clustering of the ‘s by which Gaussian they were drawn from is identifiable from the samples.
- Formalize the simplicity of that identifiability proof by showing that it is captured by a formal proof system of restricted power: the Sum of Squares (SoS) proof system.
- Guided by our SoS identifiability proof, design an algorithm for clustering in Gaussian mixtures. (And hence prove Theorem 1 from last time.)
In the last post, we accomplished task (1) in -dimensional case. In this post we will get started on task (2), again in the -dimensional case.
Recalling from last time, in our identifiability proof we remembered only two things things about our samples :
(1) They break up into clusters of equal size, , such that for some , each cluster obeys the empirical moment bound,
where is the empirical mean of the cluster , and
(2) Those means are separated: .1
The key tool we used in our identifiability proof was Fact 1, which we restate here for convenience.
Fact 1. Let have . Let denote a uniform sample from and similarly for . Let and . Suppose satisfy the -th moment bound
We are going to give a Sum of Squares proof of this fact. Or rather: we are going to state a very similar fact, which concerns inequalities among low-degree polynomials, and give a Sum of Squares proof of that.
We are going to do things in a slightly unusual order, delaying definition of the SoS proof system till we have something concrete in mind to prove in it.
First, because SoS is a proof system to reason about inequalities among low-degree polynomials, we are going to formulate Fact 2, which is like Fact 1 except it will be explicitly about low-degree polynomials. Proving Fact 2 will be our goal.
Second, we will define the SoS proof system.
Finally, we will prove this Fact 2 in the low-degree SoS proof system.
Fact 2: an SoS version of Fact 1
Fact 1 concerns two subsets of . When we used Fact 1 to prove Lemma 1 in part 1, one of those sets was one of the ground-truth clusters among , and one of them was a “candidate” cluster — Lemma 1 showed that the candidate cluster must in fact have been close to one of the true clusters .
We will design a system of polynomial equations whose solutions are in correspondence with candidate clusters . This is probably unavoidably foreign at first. We will offer some attempts at demystifying remarks later on, but for now we will forge ahead.
Let . Let be some indeterminates; they will be the variables in our polynomials. We are going to think of them as indicators of a subset which is a candidate cluster.
First, let’s enforce that is the indicator vector of a set of size . Consider the equations
Any solution to these equations over is a indicator of a subset of of size .
The second hypothesis Fact 1 places on a candidate cluster is the -th moment bound. Let’s enforce that with a polynomial inequality. First, we need some notation for the empirical mean of . Denote by the polynomial
Often we drop the and just write . And now consider the inequality:
Belaboring the point somewhat: any solution over to the equations and inequalities we have described would correspond to a subset of of which obeys the -th empirical moment bound. (Here we assume that is even, so that .)
Although we don’t have all the definitions we need in place yet, we will go ahead and state Fact 2. We introduce some suggestive notation. If , we define the polynomial
Often we just write .
Fact 2. Let . Let have ; let be its mean. Let be a power of . Suppose satisfies
Let be indeterminates. Let be the following set of equations and inequalities.
We have not yet defined the notation . This notation means that that there is a degree SoS proof of the inequality on the right-hand side using the axioms — we will define this momentarily.
The purpose of stating Fact 2 now was just to convince the reader that there is a plausible version of Fact 1 which may be stated entirely as inequalities among polynomials. The reader is encouraged to compare the hypotheses and conclusions of Facts 1 and 2 (ignoring this for now).2 There are two main differences:
(a) The inequality in the conclusion of Fact 2 is raised to the -th power, as compared to the conclusion of Fact 1.
(b) The inequality in the conclusion of Fact 2 seems to have extra factors of on both sides.
Point (a) is needed just to make both sides of the inequality into polynomials in ; otherwise there would be fractional powers. The need for (b) is more subtle, and we will not be able to fully understand it for a while. For now, what we can say is: the inequality
would be true for any which solves , but that might not have an SoS proof.
One last difference is the factor on the right-hand side of the conclusion. This is probably not inherent; it arises because we will use an easy-to-prove but lossy version of the triangle inequality in our SoS proof. In any case, it is dwarfed by the term , so we are not too worried about it.
Enough dancing around these SoS proofs — in order to stop with the hand-waiving, we need to set up our proof system.
Sum of Squares Proofs
We cannot go any further without defining the formal proof system we will work in. Since for now we are sticking with a one-dimensional setting, things can be little simpler than for the proof system we will need when become high-dimensional, but developing the proof system still incurs a little notational burden. That is life.
The Sum of Squares proof system, henceforth “SoS”, is a formal proof system for reasoning about systems of polynomial equations and inequalities over the real numbers. At its heart is the simple fact that if is a polynomial in indeterminates with real coefficients, then for all .
Plenty of elementary expositions on SoS proofs are available (see e.g. SoS on the hypercube, SoS on general domains, and the first several sections of this paper by Ryan O’Donnell and Yuan Zhou.) We will define as much as we need to keep this tutorial self-contained but for expanded discussion we refer the reader to those resources and references therein.
Let be some indeterminates. Let be some polynomial inequalities in those indeterminates. If is some other polynomial with real coefficients, it may be the case that for any satisfying , also ; we would say that implies .
The key concept for us will be that implies with a sum of squares proof. We say that SoS-proves if there exist polynomials for such that
and the polynomials are sums of squares—i.e. each of them has the form for some polynomials . Notice that if this equation holds, for any which satisfies the inequalities the right-hand side must be nonnegative, so is also nonnegative.
The polynomials form an SoS proof that implies . If are all at most , we say that the proof has degree , and we write
When we often just write .
We commonly use a few shorthand notations.
- We write , by which we mean .
- We include polynomial equations such as in , by which we mean that contains both and .
- We write , by which we mean that both and .
Although the definition suggests something static — there is a fixed collection of polynomials forming an SoS proof — in practice we treat SoS proofs as dynamic objects, building them line by line much as we would any other proof. We are going to see an example of this very soon when we prove Fact 2.
It is time to begin to make good on the promise from the last section that we would get substantial mileage out of proving identifiability of mixtures of Gaussians using only simple inequalities. While it will take us several sections to completely make good on our promise, we can begin by giving SoS versions of the triangle and Holder’s inequalities we used in our identifiability proof. We will prove one of these now to give the reader a sense of how such arguments go; since there is usually not much novelty in SoS proofs of such basic inequalities we will defer others till later.
We would like to emphasize that the following SoS proofs themselves have nothing to do with mixtures of Gaussians; instead they are part of a growing problem-independent toolkit of basic inequalities useful in designing SoS proofs of more interesting mathematical statements. The fact that one can have such a problem-independent toolkit is in part what makes the proofs-to-algorithms method so broadly useful.
SoS triangle inequality
We will start with a fairly weak SoS triangle inequality, that will suffice for our needs.
Much more sophisticated versions of this inequality are possible which allow various norms and do not lose the factor we do here.
Fact: SoS triangle inequality.
Let be indeterminates. Let be a power of . Then
To prove the SoS triangle inequality we will want the following basic observation about composability of SoS proofs. (This is really a special case of a more general composibility result.)
Proposition: squaring SoS proofs.
Suppose and are sums of squares. Then .
Proof of proposition.
By hypothesis, is a sum of squares polynomial. Now,
so it is a product of sum of squares polynomials, and hence itself a sum of squares.
Proof of SoS triangle inequality.
We start with the case . In this case, we have . We claim that , since the polynomial is a square. Hence, we find
Now to prove the general case, we proceed by induction. We may suppose
By the proposition, this implies . Now we can apply the base case again to for and to complete the argument.
SoS Holder’s inequality
Holder’s inequality poses a quandary for SoS proofs, because of the non-integral exponents in most -norms (hence such norms do not naturally correspond to polynomials). Consequently, there are many SoS versions of Holder’s inequality in the literature, choosing various ways to handle this non-integrality. The version we present here will be most useful for our mixtures of Gaussians proof. We will address the non-integral powers issue by imposing polynomial inequalities requiring that some of the underlying variables be Boolean.
Since the proof of this SoS Holder’s inequality proceeds via a similar induction to the one we used for the SoS triangle inequality we just proved, we defer it to the end of this post.
SoS Holder’s inequality.
Let be indeterminates and let be the collection of equations
Note that the only solutions to are .
Let be polynomials of degree at most . Let be a power of . Then
SoS Boolean Inequalities
We will also want one more SoS inequality for our proof of Fact 1. In linear programming relaxations of Boolean problems, it is common to replace an integrality constraint with the linear inequalities . SoS can derive the latter inequalities.
Fact: SoS Boolean Inequalities
Proof of Fact.
For the inequality , just use the axiom . For the inequality , write
Proof of Fact 2
Without further ado, we will prove Fact 2 by lifting the proof of Fact 1 into the SoS proof system. Though slightly notationally cumbersome, the proof follows that of Fact 1 nearly line by line—the reader is encouraged to compare the two proofs.
Proof of Fact 2.
We write out what we want to bound in terms of , then apply Holder’s inequality and the triangle inequality.
We deploy our SoS Holder’s inequality to obtain
Next we can use our equations to conclude that in fact
The polynomial is a sum of squares, as is via our SoS triangle inequality; applying this with and we obtain
We can add the sum of squares to obtain
Using the equations , which, as in the SoS Boolean inequalities fact can be used to prove , we obtain
Finally using and , we get
Last of all, using to simplify the term ,
The fact follows by rearranging. QED.
SoS proof of Holder’s inequality
The last thing we haven’t proved is our SoS Holder’s inequality. We will need an SoS Cauchy-Schwarz inequality to prove it.
Let be indeterminates. Then
Proof of SoS Cauchy-Schwarz.
It is not hard to check that
which is a sum of squares. QED.
Proof of SoS Holder’s inequality.
We start with the case . Using SoS Cauchy-Schwarz, we obtain
This follows from our SoS Cauchy-Schwarz inequality by substituting for and for ; we proved a fact about a sum of squares polynomial in which implies a corresponding fact about a sum of squares in variables and . The latter in turn is a sum of squares in .
To finish the first half of the case, we just need to replace with on the right-hand side. By adding the polynomials via the equations , we obtain
To establish the second inequality for the base case, we start by again adding multiples of to get
Then the inequality follows from Cauchy-Schwarz and again adding some multiples of .
Now it’s time for the induction step. We can assume
By again adding multiples of , we obtain
Now both sides are sums of squares. So, by squaring, we find
The proof is finished by applying Cauchy-Schwarz to the last term and cleaning up by adding multiples of as necessary. QED.
In the next post, we will use Fact 2 to deduce an SoS version of Lemma 1 (from part 1). Subsequently, we will finish designing an algorithm for one-dimensional clustering, proving Theorem 1 (from part 1) in the one-dimensional case. Then we will get high dimensional.
Suspicious readers may note that our original Gaussian mixture model assumed that the population means are -separated. Because we will draw enough samples that the empirical mean of each cluster is very close to the true (population) mean, this difference can be ignored for now and is easy to handle when we put everything together to analyze our final algorithm.
To make them look even more alike of course we could have introduced notations like , but for concreteness we are keeping the variables at least a little explicit.