This is part 2 of a series on clustering, Gaussian mixtures, and Sum of Squares (SoS) proofs. If you have not read it yet, I recommend starting with Part 1. Also, if you find errors please mention them in the comments (or otherwise get in touch with me) and I will fix them ASAP.
Welcome back.
In the last post, we introduced Gaussian mixture models and the clustering problem for Gaussian mixtures. We described identifiability proofs for unsupervised learning problems. Then we set ourselves some goals:
- Design a simple identifiability proof for clustering in Gaussian mixtures, saying that if
are (enough) samples from a
-dimensional mixture of
Gaussians, the ground-truth clustering of the
‘s by which Gaussian they were drawn from is identifiable from the samples.
- Formalize the simplicity of that identifiability proof by showing that it is captured by a formal proof system of restricted power: the Sum of Squares (SoS) proof system.
- Guided by our SoS identifiability proof, design an algorithm for clustering in Gaussian mixtures. (And hence prove Theorem 1 from last time.)
In the last post, we accomplished task (1) in -dimensional case. In this post we will get started on task (2), again in the
-dimensional case.
Recalling from last time, in our identifiability proof we remembered only two things things about our samples :
(1) They break up into clusters of equal size,
, such that for some
, each cluster obeys the empirical moment bound,
where is the empirical mean of the cluster
, and
(2) Those means are separated: .1
The key tool we used in our identifiability proof was Fact 1, which we restate here for convenience.
Fact 1. Let
have
. Let
denote a uniform sample from
and similarly for
. Let
and
. Suppose
satisfy the
-th moment bound
Then
We are going to give a Sum of Squares proof of this fact. Or rather: we are going to state a very similar fact, which concerns inequalities among low-degree polynomials, and give a Sum of Squares proof of that.
We are going to do things in a slightly unusual order, delaying definition of the SoS proof system till we have something concrete in mind to prove in it.
First, because SoS is a proof system to reason about inequalities among low-degree polynomials, we are going to formulate Fact 2, which is like Fact 1 except it will be explicitly about low-degree polynomials. Proving Fact 2 will be our goal.
Second, we will define the SoS proof system.
Finally, we will prove this Fact 2 in the low-degree SoS proof system.
Fact 2: an SoS version of Fact 1
Fact 1 concerns two subsets of
. When we used Fact 1 to prove Lemma 1 in part 1, one of those sets
was one of the ground-truth clusters among
, and one of them was a “candidate” cluster
— Lemma 1 showed that the candidate cluster
must in fact have been close to one of the true clusters
.
We will design a system of polynomial equations whose solutions are in correspondence with candidate clusters . This is probably unavoidably foreign at first. We will offer some attempts at demystifying remarks later on, but for now we will forge ahead.
Let . Let
be some indeterminates; they will be the variables in our polynomials. We are going to think of them as
indicators of a subset
which is a candidate cluster.
First, let’s enforce that is the indicator vector of a set of size
. Consider the equations
.
Any solution to these equations over is a
indicator of a subset of
of size
.
The second hypothesis Fact 1 places on a candidate cluster is the -th moment bound. Let’s enforce that with a polynomial inequality. First, we need some notation for the empirical mean
of
. Denote by
the polynomial
.
Often we drop the and just write
. And now consider the inequality:
.
Belaboring the point somewhat: any solution over to the equations and inequalities we have described would correspond to a subset of
of which obeys the
-th empirical moment bound. (Here we assume that
is even, so that
.)
Although we don’t have all the definitions we need in place yet, we will go ahead and state Fact 2. We introduce some suggestive notation. If , we define the polynomial
.
Often we just write .
Fact 2. Let
. Let
have
; let
be its mean. Let
be a power of
. Suppose
satisfies
Let
be indeterminates. Let
be the following set of equations and inequalities.
Then
We have not yet defined the notation . This notation means that that there is a degree
SoS proof of the inequality on the right-hand side using the axioms
— we will define this momentarily.
The purpose of stating Fact 2 now was just to convince the reader that there is a plausible version of Fact 1 which may be stated entirely as inequalities among polynomials. The reader is encouraged to compare the hypotheses and conclusions of Facts 1 and 2 (ignoring this for now).2 There are two main differences:
(a) The inequality in the conclusion of Fact 2 is raised to the -th power, as compared to the conclusion of Fact 1.
(b) The inequality in the conclusion of Fact 2 seems to have extra factors of on both sides.
Point (a) is needed just to make both sides of the inequality into polynomials in ; otherwise there would be fractional powers. The need for (b) is more subtle, and we will not be able to fully understand it for a while. For now, what we can say is: the inequality
would be true for any which solves
, but that might not have an SoS proof.
One last difference is the factor on the right-hand side of the conclusion. This is probably not inherent; it arises because we will use an easy-to-prove but lossy version of the triangle inequality in our SoS proof. In any case, it is dwarfed by the term
, so we are not too worried about it.
Enough dancing around these SoS proofs — in order to stop with the hand-waiving, we need to set up our proof system.
Sum of Squares Proofs
We cannot go any further without defining the formal proof system we will work in. Since for now we are sticking with a one-dimensional setting, things can be little simpler than for the proof system we will need when become high-dimensional, but developing the proof system still incurs a little notational burden. That is life.
The Sum of Squares proof system, henceforth “SoS”, is a formal proof system for reasoning about systems of polynomial equations and inequalities over the real numbers. At its heart is the simple fact that if is a polynomial in indeterminates
with real coefficients, then
for all
.
Plenty of elementary expositions on SoS proofs are available (see e.g. SoS on the hypercube, SoS on general domains, and the first several sections of this paper by Ryan O’Donnell and Yuan Zhou.) We will define as much as we need to keep this tutorial self-contained but for expanded discussion we refer the reader to those resources and references therein.
Let be some indeterminates. Let
be some polynomial inequalities in those indeterminates. If
is some other polynomial with real coefficients, it may be the case that for any
satisfying
, also
; we would say that
implies
.
The key concept for us will be that implies
with a sum of squares proof. We say that
SoS-proves
if there exist polynomials
for
such that
and the polynomials are sums of squares—i.e. each of them has the form
for some polynomials
. Notice that if this equation holds, for any
which satisfies the inequalities
the right-hand side must be nonnegative, so
is also nonnegative.
The polynomials form an SoS proof that
implies
. If
are all at most
, we say that the proof has degree
, and we write
.
When we often just write
.
We commonly use a few shorthand notations.
- We write
, by which we mean
.
- We include polynomial equations such as
in
, by which we mean that
contains both
and
.
- We write
, by which we mean that both
and
.
Although the definition suggests something static — there is a fixed collection of polynomials forming an SoS proof — in practice we treat SoS proofs as dynamic objects, building them line by line much as we would any other proof. We are going to see an example of this very soon when we prove Fact 2.
It is time to begin to make good on the promise from the last section that we would get substantial mileage out of proving identifiability of mixtures of Gaussians using only simple inequalities. While it will take us several sections to completely make good on our promise, we can begin by giving SoS versions of the triangle and Holder’s inequalities we used in our identifiability proof. We will prove one of these now to give the reader a sense of how such arguments go; since there is usually not much novelty in SoS proofs of such basic inequalities we will defer others till later.
We would like to emphasize that the following SoS proofs themselves have nothing to do with mixtures of Gaussians; instead they are part of a growing problem-independent toolkit of basic inequalities useful in designing SoS proofs of more interesting mathematical statements. The fact that one can have such a problem-independent toolkit is in part what makes the proofs-to-algorithms method so broadly useful.
SoS triangle inequality
We will start with a fairly weak SoS triangle inequality, that will suffice for our needs.
Much more sophisticated versions of this inequality are possible which allow various norms and do not lose the factor we do here.
Fact: SoS triangle inequality.
Letbe indeterminates. Let
be a power of
. Then
To prove the SoS triangle inequality we will want the following basic observation about composability of SoS proofs. (This is really a special case of a more general composibility result.)
Proposition: squaring SoS proofs.
Supposeand
are sums of squares. Then
.
Proof of proposition.
By hypothesis, is a sum of squares polynomial. Now,
so it is a product of sum of squares polynomials, and hence itself a sum of squares.
QED.
Proof of SoS triangle inequality.
We start with the case . In this case, we have
. We claim that
, since the polynomial
is a square. Hence, we find
Now to prove the general case, we proceed by induction. We may suppose
By the proposition, this implies . Now we can apply the base case again to
for
and
to complete the argument.
QED.
SoS Holder’s inequality
Holder’s inequality poses a quandary for SoS proofs, because of the non-integral exponents in most -norms (hence such norms do not naturally correspond to polynomials). Consequently, there are many SoS versions of Holder’s inequality in the literature, choosing various ways to handle this non-integrality. The version we present here will be most useful for our mixtures of Gaussians proof. We will address the non-integral powers issue by imposing polynomial inequalities requiring that some of the underlying variables be Boolean.
Since the proof of this SoS Holder’s inequality proceeds via a similar induction to the one we used for the SoS triangle inequality we just proved, we defer it to the end of this post.
SoS Holder’s inequality.
Letbe indeterminates and let
be the collection of equations
Note that the only solutions to
are
.
Let
be polynomials of degree at most
. Let
be a power of
. Then
and
SoS Boolean Inequalities
We will also want one more SoS inequality for our proof of Fact 1. In linear programming relaxations of Boolean problems, it is common to replace an integrality constraint with the linear inequalities
. SoS can derive the latter inequalities.
Fact: SoS Boolean Inequalities
.
Proof of Fact.
For the inequality , just use the axiom
. For the inequality
, write
.
QED.
Proof of Fact 2
Without further ado, we will prove Fact 2 by lifting the proof of Fact 1 into the SoS proof system. Though slightly notationally cumbersome, the proof follows that of Fact 1 nearly line by line—the reader is encouraged to compare the two proofs.
Proof of Fact 2.
We write out what we want to bound in terms of , then apply Holder’s inequality and the triangle inequality.
We deploy our SoS Holder’s inequality to obtain
Next we can use our equations to conclude that in fact
The polynomial is a sum of squares, as is
via our SoS triangle inequality; applying this with
and
we obtain
We can add the sum of squares to obtain
Using the equations , which, as in the SoS Boolean inequalities fact can be used to prove
, we obtain
Finally using and
, we get
Last of all, using to simplify the term
,
The fact follows by rearranging. QED.
SoS proof of Holder’s inequality
The last thing we haven’t proved is our SoS Holder’s inequality. We will need an SoS Cauchy-Schwarz inequality to prove it.
SoS Cauchy-Schwarz.
Letbe indeterminates. Then
Proof of SoS Cauchy-Schwarz.
It is not hard to check that
which is a sum of squares. QED.
Proof of SoS Holder’s inequality.
We start with the case . Using SoS Cauchy-Schwarz, we obtain
This follows from our SoS Cauchy-Schwarz inequality by substituting for
and
for
; we proved a fact about a sum of squares polynomial in
which implies a corresponding fact about a sum of squares in variables
and
. The latter in turn is a sum of squares in
.
To finish the first half of the case, we just need to replace
with
on the right-hand side. By adding the polynomials
via the equations
, we obtain
To establish the second inequality for the base case, we start by again adding multiples of
to get
Then the inequality follows from Cauchy-Schwarz and again adding some multiples of .
Now it’s time for the induction step. We can assume
By again adding multiples of , we obtain
Now both sides are sums of squares. So, by squaring, we find
The proof is finished by applying Cauchy-Schwarz to the last term and cleaning up by adding multiples of as necessary. QED.
Looking Ahead
In the next post, we will use Fact 2 to deduce an SoS version of Lemma 1 (from part 1). Subsequently, we will finish designing an algorithm for one-dimensional clustering, proving Theorem 1 (from part 1) in the one-dimensional case. Then we will get high dimensional.
Footnotes
-
Suspicious readers may note that our original Gaussian mixture model assumed that the population means are
-separated. Because we will draw enough samples that the empirical mean of each cluster is very close to the true (population) mean, this difference can be ignored for now and is easy to handle when we put everything together to analyze our final algorithm.
-
To make them look even more alike of course we could have introduced notations like
, but for concreteness we are keeping the
variables at least a little explicit.
EDITS circa 10am western: English spelling fix (thanks to Gautam Kamath).
Enjoying the series!
In the proof of the SoS triangle inequality, are you sure about the sentence “Now we can apply the base case again”? What about the missing 2 factor? Of course, if the statement of the fact is changed to include the factor of 2^{2t-1}, this little wrinkle goes away.
Great catch! I will update the statement of the fact.
I think it should be correct now — actually isn’t 2^(t-1) enough?
That’s right. And it nicely matches the base case (t=2) too.