This is part 3 of a continuing series on clustering, Gaussian mixtures, and Sum of Squares (SoS) proofs. If you have not read them yet, I recommend starting with Part 1 and Part 2. Also, if you find errors please mention them in the comments (or otherwise get in touch with me) and I will fix them ASAP.
The Story So Far
Let’s have a brief recap. We are designing an algorithm to cluster samples from Gaussian mixture models on . Our plan is to do this by turning a simple identifiability proof into an algorithm. For us, “simple” means that the proof is captured by the low degree Sum of Squares (SoS) proof system.
We have so far addressed only the case (which will remain true in this post). In part 1 we designed our identifiability proof, not yet trying to formally capture it in SoS. The proof was simple in the sense that it used only the triangle inequality and Holder’s inequality. In part 2 we defined SoS proofs formally, and stated and proved an SoS version of one of the key facts in the identifiability proof (Fact 2).
In this post we are going to finish up our SoS identifiability proof. In the next post, we will see how to transform the identifiability proof into an algorithm.
Setting
We recall our setting formally. Although our eventual goal is to cluster samples sampled from a mixture of
Gaussians, we decided to remember only a few properties of such a collection of samples, which will hold with high probability.
The properties are:
(1) They break up into clusters of equal size,
, such that for some
, each cluster obeys the empirical moment bound,
where is the empirical mean of the cluster
, and
(2) Those means are separated: .
The main statement of cluster identifiability was Lemma 1, which we restate for convenience here.
Lemma 1. Let
. Let
be a partition of
into
pieces of size
such that for each
, the collection of numbers
obeys the following moment bound:
where
is the average
and
is some number in
. Let
be such that
for every
. Suppose
is large enough that
.
Let
have size
and be such that
obey the same moment-boundedness property:
for the same
, where
is the mean
. Then there exists an
such that
for some universal constant
.
Our main goal in this post is to state and prove an SoS version of Lemma 1. We have already proved the following Fact 2, an SoS analogue of Fact 1 which we used to prove Lemma 1.
Fact 2. Let
. Let
have
; let
be its mean. Let
be a power of
. Suppose
satisfies
Let
be indeterminates. Let
be the following set of equations and inequalities.
Then
Remaining obstacles to an SoS version of Lemma 1
We are going to face a couple of problems.
(1) The statement and proof of Lemma 1 are not sufficiently symmetric for our purposes — it is hard to phrase things like “there exists a cluster such that…” as statements directly about polynomials. We will handle this by giving more symmetric version of Lemma 1, with a more symmetric proof.
(2) Our proof of Lemma 1 uses the conclusion of Fact 1 in the form
whereas Fact 2 concludes something slightly different:
The difference in question is that the polynomials in Fact 2 are degree , and
appears on both sides of the inequality. If we were not worried about SoS proofs, we could just cancel terms in the second inequality and take
-th roots to obtain the first, but these operations are not necessarily allowed by the SoS proof system.
One route to handling this would be to state and prove a version of Lemma 1 which concerns only degree . This is probably possible but definitely inconvenient. Instead we will exhibit a common approach to situations where it would be useful to cancel terms and take roots but the SoS proof system doesn’t quite allow it: we will work simultaneously with SoS proofs and with their dual objects, pseudodistributions.
We will tackle issues (1) and (2) in turn, starting with the (a)symmetry issue.
Lemma 1 reformulated: maintaining symmetry
We pause here to record an alternative version of Lemma 1, with an alternative proof. This second version is conceptually the same as the one we gave in part 1, but it avoids breaking the symmetry among the clusters , whereas this was done at the very beginning of the first proof, by choosing the ordering of the clusters by
. Maintaining this symmetry requires a slight reformulation of the proof, but will eventually make it easier to phrase the proof in the Sum of Squares proof system. In this proof we will also avoid the assumption
, however, we will pay a factor of
rather than
in the final bound.
Alternative version of Lemma 1.
Let.
Letbe a partition of
into
pieces of size
such that for each
, the collection of numbers
obeys the following moment bound:
where
is the average
and
is some number in
. Let
be such that
for every
.
Let
have size
and be such that
obey the same moment-boundedness property:
.
for the same
, where
. Then
We remark on the conclusion of this alternative version of Lemma 1. Notice that are nonnegative numbers which sum to
. The conclusion of the lemma is that
for
. Since the sum of their squares is at least
, one obtains
matching the conclusion of our first version of Lemma 1 up to an extra factor of .
Proof of alternative version of Lemma 1.
Let again have size
with mean
and
-th moment bound
. Since
partition
,
We will endeavor to bound for every pair
. Since
,
Certainly and similarly for
, so this is at most
Using Fact 1, this in turn is at most . So, we obtained
for every .
Putting this together with our first bound on , we get
QED.
Now that we have resolved the asymmetry issue in our earlier version of Lemma 1, it is time to move on to pseudodistributions, the dual objects of SoS proofs, so that we can tackle the last remaining hurdles to proving an SoS version of Lemma 1.
Pseudodistributions and duality
Pseudodistributions are the convex duals of SoS proofs. As with SoS proofs, there are several expositions covering elementary definitions and results in detail (e.g. the lecture notes of Barak and Steurer, here and here). We will define what we need to keep the tutorial self-contained but refer the reader elsewhere for further discussion. Here we follow the exposition in those lecture notes.
As usual, let be some indeterminates. For a finitely-supported function
and a function
, define
If defines a probability distribution, then
is the operator sending a function to its expectation under
.
A finitely-supported is a degree
pseudodistribution if
(1)
(2) for every polynomial
of degree at most
.
When is clear from context, we usually suppress it and write
. Furthermore, if
is an operator and
for some pseudodistribution
, we often abuse terminology and call
a pseudoexpectation.
If is a family of polynomial inequalities and
is a degree
pseudodistribution, we say
satisfies
if for every
and
such that
one has
We are not going to rehash the basic duality theory of SoS proofs and pseudodistributions here, but we will need the following basic fact, which is easy to prove from the definitions.
Fact: weak soundness of SoS proofs.
Supposeand that
is a degree
pseudodistribution which satisfies
. Then for every SoS polynomial
, if $\deg h + \ell \leq d$ then
.
We call this “weak soundness” because somewhat stronger statements are available, which more readily allow several SoS proofs to be composed. See Barak and Steurer’s notes for more.
The following fact exemplifies what we mean in the claim that pseudodistributions help make up for the inflexibility of SoS proofs to cancel terms in inequalities.
Fact: pseudoexpectation Cauchy-Schwarz.
Letbe a degree
pseudoexpectation on indeterminates
. Let
and
be polynomials of degree at most
. Then
As a consequence, if
has degree
and
is a power of
, by induction
Proof of pseudoexpectation Cauchy-Schwarz.
For variety, we will do this proof in the language of matrices rather than linear operators. Let be the matrix indexed by monomials among
of degree at most
, with entries
. If
is a polynomial of degree at most
, we can think of
as a vector indexed by monomials (whose entries are the coefficients of
) such that
. Hence,
QED.
We will want a second, similar fact.
Fact: pseudoexpectation Holder’s.
Letbe a degree
sum of squares polynomial,
, and
a degree
pseudoexpectation. Then
The proof of pseudoexpectation Holder’s is similar to several we have already seen; it can be found as Lemma A.4 in this paper by Barak, Kelner, and Steurer.
Lemma 2: an SoS version of Lemma 1
We are ready to state and prove our SoS version of Lemma 1. The reader is encouraged to compare the statement of Lemma 2 to the alternative version of Lemma 1. The proof will be almost identical to the proof of the alternative version of Lemma 1.
Lemma 2.
Let. Let
be a partition of
into
pieces of size
such that for each
, the collection of numbers
obeys the following moment bound:
where
is the average
and
is a power of
in
. Let
be such that
for every
.
Let
be indeterminates. Let
be the following set of equations and inequalities.
As before
is the polynomial
. Thinking of the variables
as defining a set
via its
indicator, let
be the formal expression
Let
be a degree
pseudoexpectation which satisfies
. Then
Proof of Lemma 2.
We will endeavor to bound from above for every
. Since we want to use the degree
polynomials in Fact 2, we get started with
by (repeated) pseudoexpectation Cauchy-Schwarz.
Since and
are
-separated, i.e.
, we also have
where the indeterminate is and we have only used the SoS triangle inequality. Hence,
Applying Fact 2 and soundness to the right-hand side, we get
Now using that and hence
and similarly for
, we get
By pseudoexpectation Cauchy-Schwarz
which, combined with the preceding, rearranges to
By pseudoexpectation Holder’s,
All together, we got
Now we no longer have to worry about SoS proofs; we can just cancel the terms on either side of the inequality to get
Putting this together with
finishes the proof. QED.
EDIT: fixed incorrect proof of pseudoexpectation cauchy-schwarz.
EDIT: fixed incorrect statement of soundness of SoS proofs. Thanks to David Steurer for pointing out the error.