*This is part 3 of a continuing series on clustering, Gaussian mixtures, and Sum of Squares (SoS) proofs. If you have not read them yet, I recommend starting with Part 1 and Part 2. Also, if you find errors please mention them in the comments (or otherwise get in touch with me) and I will fix them ASAP.*

## The Story So Far

Let’s have a brief recap. We are designing an algorithm to cluster samples from Gaussian mixture models on . Our plan is to do this by turning a simple *identifiability proof *into an algorithm. For us, “simple” means that the proof is captured by the low degree Sum of Squares (SoS) proof system.

We have so far addressed only the case (which will remain true in this post). In part 1 we designed our identifiability proof, not yet trying to formally capture it in SoS. The proof was simple in the sense that it used only the triangle inequality and Holder’s inequality. In part 2 we defined SoS proofs formally, and stated and proved an SoS version of one of the key facts in the identifiability proof (Fact 2).

In this post we are going to finish up our SoS identifiability proof. In the next post, we will see how to transform the identifiability proof into an algorithm.

## Setting

We recall our setting formally. Although our eventual goal is to cluster samples sampled from a mixture of Gaussians, we decided to remember only a few properties of such a collection of samples, which will hold with high probability.

The properties are:

(1) They break up into clusters of equal size, , such that for some , each cluster obeys the empirical moment bound,

where is the empirical mean of the cluster , and

(2) Those means are separated: .

The main statement of cluster identifiability was Lemma 1, which we restate for convenience here.

Lemma 1.Let . Let be a partition of into pieces of size such that for each , the collection of numbers obeys the following moment bound:where is the average and is some number in . Let be such that for every . Suppose is large enough that .

Let have size and be such that obey the same moment-boundedness property:

for the same , where is the mean . Then there exists an such that

for some universal constant .

Our main goal in this post is to state and prove an SoS version of Lemma 1. We have already proved the following Fact 2, an SoS analogue of Fact 1 which we used to prove Lemma 1.

Fact 2.Let . Let have ; let be its mean. Let be a power of . Suppose satisfiesLet be indeterminates. Let be the following set of equations and inequalities.

Then

## Remaining obstacles to an SoS version of Lemma 1

We are going to face a couple of problems.

(1) The statement and proof of Lemma 1 are not sufficiently symmetric for our purposes — it is hard to phrase things like “there exists a cluster such that…” as statements directly about polynomials. We will handle this by giving more symmetric version of Lemma 1, with a more symmetric proof.

(2) Our proof of Lemma 1 uses the conclusion of Fact 1 in the form

whereas Fact 2 concludes something slightly different:

The difference in question is that the polynomials in Fact 2 are degree , and appears on both sides of the inequality. If we were not worried about SoS proofs, we could just cancel terms in the second inequality and take -th roots to obtain the first, but these operations are not necessarily allowed by the SoS proof system.

One route to handling this would be to state and prove a version of Lemma 1 which concerns only degree . This is probably possible but definitely inconvenient. Instead we will exhibit a common approach to situations where it would be useful to cancel terms and take roots but the SoS proof system doesn’t quite allow it: we will work simultaneously with SoS proofs and with their dual objects, *pseudodistributions*.

We will tackle issues (1) and (2) in turn, starting with the (a)symmetry issue.

## Lemma 1 reformulated: maintaining symmetry

We pause here to record an alternative version of Lemma 1, with an alternative proof. This second version is conceptually the same as the one we gave in part 1, but it avoids breaking the symmetry among the clusters , whereas this was done at the very beginning of the first proof, by choosing the ordering of the clusters by . Maintaining this symmetry requires a slight reformulation of the proof, but will eventually make it easier to phrase the proof in the Sum of Squares proof system. In this proof we will also avoid the assumption , however, we will pay a factor of rather than in the final bound.

Alternative version of Lemma 1.

Let .

Let be a partition of into pieces of size such that for each , the collection of numbers obeys the following moment bound:where is the average and is some number in . Let be such that for every .

Let have size and be such that obey the same moment-boundedness property:

.

for the same , where . Then

We remark on the conclusion of this alternative version of Lemma 1. Notice that are nonnegative numbers which sum to . The conclusion of the lemma is that for . Since the sum of their squares is at least , one obtains

matching the conclusion of our first version of Lemma 1 up to an extra factor of .

*Proof of alternative version of Lemma 1.*

Let again have size with mean and -th moment bound . Since partition ,

We will endeavor to bound for every pair . Since ,

Certainly and similarly for , so this is at most

Using Fact 1, this in turn is at most . So, we obtained

for every .

Putting this together with our first bound on , we get

QED.

Now that we have resolved the asymmetry issue in our earlier version of Lemma 1, it is time to move on to pseudodistributions, the dual objects of SoS proofs, so that we can tackle the last remaining hurdles to proving an SoS version of Lemma 1.

## Pseudodistributions and duality

Pseudodistributions are the convex duals of SoS proofs. As with SoS proofs, there are several expositions covering elementary definitions and results in detail (e.g. the lecture notes of Barak and Steurer, here and here). We will define what we need to keep the tutorial self-contained but refer the reader elsewhere for further discussion. Here we follow the exposition in those lecture notes.

As usual, let be some indeterminates. For a finitely-supported function and a function , define

If defines a probability distribution, then is the operator sending a function to its expectation under .

A finitely-supported is a degree pseudodistribution if

(1)

(2) for every polynomial of degree at most .

When is clear from context, we usually suppress it and write . Furthermore, if is an operator and for some pseudodistribution , we often abuse terminology and call a pseudoexpectation.

If is a family of polynomial inequalities and is a degree pseudodistribution, we say satisfies if for every and such that one has

We are not going to rehash the basic duality theory of SoS proofs and pseudodistributions here, but we will need the following basic fact, which is easy to prove from the definitions.

Fact: weak soundness of SoS proofs.

Suppose and that is a degree pseudodistribution which satisfies . Then for every SoS polynomial , if $\deg h + \ell \leq d$ then .

We call this “weak soundness” because somewhat stronger statements are available, which more readily allow several SoS proofs to be composed. See Barak and Steurer’s notes for more.

The following fact exemplifies what we mean in the claim that pseudodistributions help make up for the inflexibility of SoS proofs to cancel terms in inequalities.

Fact: pseudoexpectation Cauchy-Schwarz.

Let be a degree pseudoexpectation on indeterminates . Let and be polynomials of degree at most . ThenAs a consequence, if has degree and is a power of , by induction

*Proof of pseudoexpectation Cauchy-Schwarz.*

For variety, we will do this proof in the language of matrices rather than linear operators. Let be the matrix indexed by monomials among of degree at most , with entries . If is a polynomial of degree at most , we can think of as a vector indexed by monomials (whose entries are the coefficients of ) such that . Hence,

QED.

We will want a second, similar fact.

Fact: pseudoexpectation Holder’s.

Let be a degree sum of squares polynomial, , and a degree pseudoexpectation. Then

The proof of pseudoexpectation Holder’s is similar to several we have already seen; it can be found as Lemma A.4 in this paper by Barak, Kelner, and Steurer.

## Lemma 2: an SoS version of Lemma 1

We are ready to state and prove our SoS version of Lemma 1. The reader is encouraged to compare the statement of Lemma 2 to the alternative version of Lemma 1. The proof will be almost identical to the proof of the alternative version of Lemma 1.

Lemma 2.

Let . Let be a partition of into pieces of size such that for each , the collection of numbers obeys the following moment bound:where is the average and is a power of in . Let be such that for every .

Let be indeterminates. Let be the following set of equations and inequalities.

As before is the polynomial . Thinking of the variables as defining a set via its indicator, let be the formal expression

Let be a degree pseudoexpectation which satisfies . Then

*Proof of Lemma 2.*

We will endeavor to bound from above for every . Since we want to use the degree polynomials in Fact 2, we get started with

by (repeated) pseudoexpectation Cauchy-Schwarz.

Since and are -separated, i.e. , we also have

where the indeterminate is and we have only used the SoS triangle inequality. Hence,

Applying Fact 2 and soundness to the right-hand side, we get

Now using that and hence and similarly for , we get

By pseudoexpectation Cauchy-Schwarz

which, combined with the preceding, rearranges to

By pseudoexpectation Holder’s,

All together, we got

Now we no longer have to worry about SoS proofs; we can just cancel the terms on either side of the inequality to get

Putting this together with

finishes the proof. QED.

EDIT: fixed incorrect proof of pseudoexpectation cauchy-schwarz.

EDIT: fixed incorrect statement of soundness of SoS proofs. Thanks to David Steurer for pointing out the error.