This is part 2 of a series on clustering, Gaussian mixtures, and Sum of Squares (SoS) proofs. If you have not read it yet, I recommend starting with Part 1. Also, if you find errors please mention them in the comments (or otherwise get in touch with me) and I will fix them ASAP.
Welcome back.
In the last post, we introduced Gaussian mixture models and the clustering problem for Gaussian mixtures. We described identifiability proofs for unsupervised learning problems. Then we set ourselves some goals:
- Design a simple identifiability proof for clustering in Gaussian mixtures, saying that if
are (enough) samples from a
-dimensional mixture of
Gaussians, the ground-truth clustering of the
‘s by which Gaussian they were drawn from is identifiable from the samples.
- Formalize the simplicity of that identifiability proof by showing that it is captured by a formal proof system of restricted power: the Sum of Squares (SoS) proof system.
- Guided by our SoS identifiability proof, design an algorithm for clustering in Gaussian mixtures. (And hence prove Theorem 1 from last time.)
In the last post, we accomplished task (1) in
-dimensional case. In this post we will get started on task (2), again in the
-dimensional case.
Recalling from last time, in our identifiability proof we remembered only two things things about our samples
:
(1) They break up into clusters
of equal size,
, such that for some
, each cluster obeys the empirical moment bound,

where
is the empirical mean of the cluster
, and
(2) Those means are separated:
.1
The key tool we used in our identifiability proof was Fact 1, which we restate here for convenience.
Fact 1. Let
have
. Let
denote a uniform sample from
and similarly for
. Let
and
. Suppose
satisfy the
-th moment bound

Then

We are going to give a Sum of Squares proof of this fact. Or rather: we are going to state a very similar fact, which concerns inequalities among low-degree polynomials, and give a Sum of Squares proof of that.
We are going to do things in a slightly unusual order, delaying definition of the SoS proof system till we have something concrete in mind to prove in it.
First, because SoS is a proof system to reason about inequalities among low-degree polynomials, we are going to formulate Fact 2, which is like Fact 1 except it will be explicitly about low-degree polynomials. Proving Fact 2 will be our goal.
Second, we will define the SoS proof system.
Finally, we will prove this Fact 2 in the low-degree SoS proof system.
Fact 2: an SoS version of Fact 1
Fact 1 concerns two subsets
of
. When we used Fact 1 to prove Lemma 1 in part 1, one of those sets
was one of the ground-truth clusters among
, and one of them was a “candidate” cluster
— Lemma 1 showed that the candidate cluster
must in fact have been close to one of the true clusters
.
We will design a system of polynomial equations whose solutions are in correspondence with candidate clusters
. This is probably unavoidably foreign at first. We will offer some attempts at demystifying remarks later on, but for now we will forge ahead.
Let
. Let
be some indeterminates; they will be the variables in our polynomials. We are going to think of them as
indicators of a subset
which is a candidate cluster.
First, let’s enforce that
is the indicator vector of a set of size
. Consider the equations
.
Any solution to these equations over
is a
indicator of a subset of
of size
.
The second hypothesis Fact 1 places on a candidate cluster is the
-th moment bound. Let’s enforce that with a polynomial inequality. First, we need some notation for the empirical mean
of
. Denote by
the polynomial
.
Often we drop the
and just write
. And now consider the inequality:
.
Belaboring the point somewhat: any solution over
to the equations and inequalities we have described would correspond to a subset of
of which obeys the
-th empirical moment bound. (Here we assume that
is even, so that
.)
Although we don’t have all the definitions we need in place yet, we will go ahead and state Fact 2. We introduce some suggestive notation. If
, we define the polynomial
.
Often we just write
.
Fact 2. Let
. Let
have
; let
be its mean. Let
be a power of
. Suppose
satisfies

Let
be indeterminates. Let
be the following set of equations and inequalities.
![w_i^2 = w_i \text{ for } i \in [n] w_i^2 = w_i \text{ for } i \in [n]](https://s0.wp.com/latex.php?latex=w_i%5E2+%3D+w_i+%5Ctext%7B+for+%7D+i+%5Cin+%5Bn%5D&bg=eeeeee&fg=666666&s=0&c=20201002)
![\sum_{i \in [n]} w_i = N \sum_{i \in [n]} w_i = N](https://s0.wp.com/latex.php?latex=%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%3D+N&bg=eeeeee&fg=666666&s=0&c=20201002)
![\frac 1 N \sum_{i \in [n]} w_i \cdot (X_i - \mu)^t \leq 2 \cdot t^{t/2}. \frac 1 N \sum_{i \in [n]} w_i \cdot (X_i - \mu)^t \leq 2 \cdot t^{t/2}.](https://s0.wp.com/latex.php?latex=%5Cfrac+1+N+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+%28X_i+-+%5Cmu%29%5Et+%5Cleq+2+%5Ccdot+t%5E%7Bt%2F2%7D.&bg=eeeeee&fg=666666&s=0&c=20201002)
Then

We have not yet defined the notation
. This notation means that that there is a degree
SoS proof of the inequality on the right-hand side using the axioms
— we will define this momentarily.
The purpose of stating Fact 2 now was just to convince the reader that there is a plausible version of Fact 1 which may be stated entirely as inequalities among polynomials. The reader is encouraged to compare the hypotheses and conclusions of Facts 1 and 2 (ignoring this
for now).2 There are two main differences:
(a) The inequality in the conclusion of Fact 2 is raised to the
-th power, as compared to the conclusion of Fact 1.
(b) The inequality in the conclusion of Fact 2 seems to have extra factors of
on both sides.
Point (a) is needed just to make both sides of the inequality into polynomials in
; otherwise there would be fractional powers. The need for (b) is more subtle, and we will not be able to fully understand it for a while. For now, what we can say is: the inequality

would be true for any
which solves
, but that might not have an SoS proof.
One last difference is the factor
on the right-hand side of the conclusion. This is probably not inherent; it arises because we will use an easy-to-prove but lossy version of the triangle inequality in our SoS proof. In any case, it is dwarfed by the term
, so we are not too worried about it.
Enough dancing around these SoS proofs — in order to stop with the hand-waiving, we need to set up our proof system.
Sum of Squares Proofs
We cannot go any further without defining the formal proof system we will work in. Since for now we are sticking with a one-dimensional setting, things can be little simpler than for the proof system we will need when
become high-dimensional, but developing the proof system still incurs a little notational burden. That is life.
The Sum of Squares proof system, henceforth “SoS”, is a formal proof system for reasoning about systems of polynomial equations and inequalities over the real numbers. At its heart is the simple fact that if
is a polynomial in indeterminates
with real coefficients, then
for all
.
Plenty of elementary expositions on SoS proofs are available (see e.g. SoS on the hypercube, SoS on general domains, and the first several sections of this paper by Ryan O’Donnell and Yuan Zhou.) We will define as much as we need to keep this tutorial self-contained but for expanded discussion we refer the reader to those resources and references therein.
Let
be some indeterminates. Let
be some polynomial inequalities in those indeterminates. If
is some other polynomial with real coefficients, it may be the case that for any
satisfying
, also
; we would say that
implies
.
The key concept for us will be that
implies
with a sum of squares proof. We say that
SoS-proves
if there exist polynomials
for
such that
![r(x) = \sum_{S \subseteq [m]} b_S(x) \prod_{i \in S} p_i(x) r(x) = \sum_{S \subseteq [m]} b_S(x) \prod_{i \in S} p_i(x)](https://s0.wp.com/latex.php?latex=r%28x%29+%3D+%5Csum_%7BS+%5Csubseteq+%5Bm%5D%7D+b_S%28x%29+%5Cprod_%7Bi+%5Cin+S%7D+p_i%28x%29&bg=eeeeee&fg=666666&s=0&c=20201002)
and the polynomials
are sums of squares—i.e. each of them has the form
for some polynomials
. Notice that if this equation holds, for any
which satisfies the inequalities
the right-hand side must be nonnegative, so
is also nonnegative.
The polynomials
form an SoS proof that
implies
. If
are all at most
, we say that the proof has degree
, and we write
.
When
we often just write
.
We commonly use a few shorthand notations.
- We write
, by which we mean
.
- We include polynomial equations such as
in
, by which we mean that
contains both
and
.
- We write
, by which we mean that both
and
.
Although the definition suggests something static — there is a fixed collection of polynomials
forming an SoS proof — in practice we treat SoS proofs as dynamic objects, building them line by line much as we would any other proof. We are going to see an example of this very soon when we prove Fact 2.
It is time to begin to make good on the promise from the last section that we would get substantial mileage out of proving identifiability of mixtures of Gaussians using only simple inequalities. While it will take us several sections to completely make good on our promise, we can begin by giving SoS versions of the triangle and Holder’s inequalities we used in our identifiability proof. We will prove one of these now to give the reader a sense of how such arguments go; since there is usually not much novelty in SoS proofs of such basic inequalities we will defer others till later.
We would like to emphasize that the following SoS proofs themselves have nothing to do with mixtures of Gaussians; instead they are part of a growing problem-independent toolkit of basic inequalities useful in designing SoS proofs of more interesting mathematical statements. The fact that one can have such a problem-independent toolkit is in part what makes the proofs-to-algorithms method so broadly useful.
SoS triangle inequality
We will start with a fairly weak SoS triangle inequality, that will suffice for our needs.
Much more sophisticated versions of this inequality are possible which allow various norms and do not lose the factor
we do here.
Fact: SoS triangle inequality.
Let
be indeterminates. Let
be a power of
. Then

To prove the SoS triangle inequality we will want the following basic observation about composability of SoS proofs. (This is really a special case of a more general composibility result.)
Proposition: squaring SoS proofs.
Suppose
and
are sums of squares. Then
.
Proof of proposition.
By hypothesis,
is a sum of squares polynomial. Now,

so it is a product of sum of squares polynomials, and hence itself a sum of squares.
QED.
Proof of SoS triangle inequality.
We start with the case
. In this case, we have
. We claim that
, since the polynomial
is a square. Hence, we find

Now to prove the general case, we proceed by induction. We may suppose

By the proposition, this implies
. Now we can apply the base case again to
for
and
to complete the argument.
QED.
SoS Holder’s inequality
Holder’s inequality poses a quandary for SoS proofs, because of the non-integral exponents in most
-norms (hence such norms do not naturally correspond to polynomials). Consequently, there are many SoS versions of Holder’s inequality in the literature, choosing various ways to handle this non-integrality. The version we present here will be most useful for our mixtures of Gaussians proof. We will address the non-integral powers issue by imposing polynomial inequalities requiring that some of the underlying variables be Boolean.
Since the proof of this SoS Holder’s inequality proceeds via a similar induction to the one we used for the SoS triangle inequality we just proved, we defer it to the end of this post.
SoS Holder’s inequality.
Let
be indeterminates and let
be the collection of equations
![\mathcal{W} = \{ w_i^2 - w_i = 0 \, : i \in [n] \}. \mathcal{W} = \{ w_i^2 - w_i = 0 \, : i \in [n] \}.](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%3D+%5C%7B+w_i%5E2+-+w_i+%3D+0+%5C%2C+%3A+i+%5Cin+%5Bn%5D+%5C%7D.&bg=eeeeee&fg=666666&s=0&c=20201002)
Note that the only solutions to
are
.
Let
be polynomials of degree at most
. Let
be a power of
. Then
![\mathcal{W} \vdash_{O(t\ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^t \leq \left ( \sum_{i \in [n]} w_i \right )^{t-1} \cdot \sum_{i \in [n]} p_i(w)^t. \mathcal{W} \vdash_{O(t\ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^t \leq \left ( \sum_{i \in [n]} w_i \right )^{t-1} \cdot \sum_{i \in [n]} p_i(w)^t.](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%5Cvdash_%7BO%28t%5Cell%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright+%29%5Et+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Cright+%29%5E%7Bt-1%7D+%5Ccdot+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+p_i%28w%29%5Et.&bg=eeeeee&fg=666666&s=0&c=20201002)
and
![\mathcal{W} \vdash_{O(t\ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^t \leq \left ( \sum_{i \in [n]} w_i \right )^{t-1} \cdot \sum_{i \in [n]} w_i \cdot p_i(w)^t. \mathcal{W} \vdash_{O(t\ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^t \leq \left ( \sum_{i \in [n]} w_i \right )^{t-1} \cdot \sum_{i \in [n]} w_i \cdot p_i(w)^t.](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%5Cvdash_%7BO%28t%5Cell%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright+%29%5Et+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Cright+%29%5E%7Bt-1%7D+%5Ccdot+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29%5Et.&bg=eeeeee&fg=666666&s=0&c=20201002)
SoS Boolean Inequalities
We will also want one more SoS inequality for our proof of Fact 1. In linear programming relaxations of Boolean problems, it is common to replace an integrality constraint
with the linear inequalities
. SoS can derive the latter inequalities.
Fact: SoS Boolean Inequalities
.
Proof of Fact.
For the inequality
, just use the axiom
. For the inequality
, write
.
QED.
Proof of Fact 2
Without further ado, we will prove Fact 2 by lifting the proof of Fact 1 into the SoS proof system. Though slightly notationally cumbersome, the proof follows that of Fact 1 nearly line by line—the reader is encouraged to compare the two proofs.
Proof of Fact 2.
We write out what we want to bound in terms of
, then apply Holder’s inequality and the triangle inequality.
![\left ( \sum_{i \in S} w_i \right )^t \cdot (\mu - \mu_S)^t = \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t. \left ( \sum_{i \in S} w_i \right )^t \cdot (\mu - \mu_S)^t = \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t.](https://s0.wp.com/latex.php?latex=%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cright+%29%5Et+%5Ccdot+%28%5Cmu+-+%5Cmu_S%29%5Et+%3D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D%C2%A0+%5Cright+%29%5Et.&bg=eeeeee&fg=666666&s=0&c=20201002)
We deploy our SoS Holder’s inequality to obtain
![\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \leq \left ( \sum_{i \in S} w_i \right )^{t-1} \cdot \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ]^t. \mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \leq \left ( \sum_{i \in S} w_i \right )^{t-1} \cdot \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ]^t.](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BA%7D+%5Cvdash_%7BO%28t%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D+%5Cright+%29%5Et+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cright+%29%5E%7Bt-1%7D+%5Ccdot+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D%5Et.&bg=eeeeee&fg=666666&s=0&c=20201002)
Next we can use our equations
to conclude that in fact
![\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \leq \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in S} w_i^2 \left [ (\mu - X_i) - (\mu_S - X_i) \right ]^t. \mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \leq \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in S} w_i^2 \left [ (\mu - X_i) - (\mu_S - X_i) \right ]^t.](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BA%7D+%5Cvdash_%7BO%28t%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D+%5Cright+%29%5Et+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i%5E2+%5Cright+%29%5E%7Bt-1%7D+%5Ccdot+%5Csum_%7Bi+%5Cin+S%7D+w_i%5E2+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D%5Et.+&bg=eeeeee&fg=666666&s=0&c=20201002)
The polynomial
is a sum of squares, as is
via our SoS triangle inequality; applying this with
and
we obtain
![\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right ) ^t \mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right ) ^t](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BA%7D+%5Cvdash_%7BO%28t%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D+%5Cright+%29+%5Et&bg=eeeeee&fg=666666&s=0&c=20201002)

We can add the sum of squares
to obtain
![\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BA%7D+%5Cvdash_%7BO%28t%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D+%5Cright+%29%5Et&bg=eeeeee&fg=666666&s=0&c=20201002)
![\leq 2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in [n]} w_i^2 (\mu - X_i)^t + w_i^2 (\mu_S - X_i)^t. \leq 2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in [n]} w_i^2 (\mu - X_i)^t + w_i^2 (\mu_S - X_i)^t.](https://s0.wp.com/latex.php?latex=%5Cleq+2%5Et+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i%5E2+%5Cright+%29%5E%7Bt-1%7D+%5Ccdot+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i%5E2+%28%5Cmu+-+X_i%29%5Et+%2B+w_i%5E2+%28%5Cmu_S+-+X_i%29%5Et.&bg=eeeeee&fg=666666&s=0&c=20201002)
Using the equations
, which, as in the SoS Boolean inequalities fact can be used to prove
, we obtain
![\mathcal{A} \vdash_{O(t)} \left (\sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \mathcal{A} \vdash_{O(t)} \left (\sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BA%7D+%5Cvdash_%7BO%28t%29%7D+%5Cleft+%28%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D+%5Cright+%29%5Et&bg=eeeeee&fg=666666&s=0&c=20201002)
![\leq 2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in [n]} w_i (\mu - X_i)^t + (\mu_S - X_i)^t. \leq 2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in [n]} w_i (\mu - X_i)^t + (\mu_S - X_i)^t.](https://s0.wp.com/latex.php?latex=%5Cleq+2%5Et+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i%5E2+%5Cright+%29%5E%7Bt-1%7D+%5Ccdot+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%28%5Cmu+-+X_i%29%5Et+%2B+%28%5Cmu_S+-+X_i%29%5Et.&bg=eeeeee&fg=666666&s=0&c=20201002)
Finally using
and
, we get
![\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BA%7D+%5Cvdash_%7BO%28t%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D+%5Cright+%29%5Et&bg=eeeeee&fg=666666&s=0&c=20201002)

Last of all, using
to simplify the term
,
![\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BA%7D+%5Cvdash_%7BO%28t%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+S%7D+w_i+%5Cleft+%5B+%28%5Cmu+-+X_i%29+-+%28%5Cmu_S+-+X_i%29+%5Cright+%5D+%5Cright+%29%5Et&bg=eeeeee&fg=666666&s=0&c=20201002)

The fact follows by rearranging. QED.
SoS proof of Holder’s inequality
The last thing we haven’t proved is our SoS Holder’s inequality. We will need an SoS Cauchy-Schwarz inequality to prove it.
SoS Cauchy-Schwarz.
Let
be indeterminates. Then
![\vdash_2 \left ( \sum_{i \in [n]} x_i y_i \right )^2 \leq \left ( \sum_{i \in [n]} x_i^2 \right ) \left ( \sum_{i \in [n]} y_i^2 \right ). \vdash_2 \left ( \sum_{i \in [n]} x_i y_i \right )^2 \leq \left ( \sum_{i \in [n]} x_i^2 \right ) \left ( \sum_{i \in [n]} y_i^2 \right ).](https://s0.wp.com/latex.php?latex=%5Cvdash_2+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+x_i+y_i+%5Cright+%29%5E2+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+x_i%5E2+%5Cright+%29+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+y_i%5E2+%5Cright+%29.&bg=eeeeee&fg=666666&s=0&c=20201002)
Proof of SoS Cauchy-Schwarz.
It is not hard to check that
![\left ( \sum_{i \in [n]} x_i^2 \right ) \left ( \sum_{i \in [n]} y_i^2 \right ) - \left ( \sum_{i \in [n]} x_i y_i \right )^2 = \sum_{i,j \in [n]} (x_i y_j - x_j y_i)^2 \left ( \sum_{i \in [n]} x_i^2 \right ) \left ( \sum_{i \in [n]} y_i^2 \right ) - \left ( \sum_{i \in [n]} x_i y_i \right )^2 = \sum_{i,j \in [n]} (x_i y_j - x_j y_i)^2](https://s0.wp.com/latex.php?latex=%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+x_i%5E2+%5Cright+%29+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+y_i%5E2+%5Cright+%29+-+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+x_i+y_i+%5Cright+%29%5E2+%3D+%5Csum_%7Bi%2Cj+%5Cin+%5Bn%5D%7D+%28x_i+y_j+-+x_j+y_i%29%5E2&bg=eeeeee&fg=666666&s=0&c=20201002)
which is a sum of squares. QED.
Proof of SoS Holder’s inequality.
We start with the case
. Using SoS Cauchy-Schwarz, we obtain
![\vdash_{2\ell + 2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right)^2 \leq \left ( \sum_{i \in [n]} w_i^2 \right) \cdot \left ( \sum_{i \in [n]} p_i(w)^2 \right). \vdash_{2\ell + 2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right)^2 \leq \left ( \sum_{i \in [n]} w_i^2 \right) \cdot \left ( \sum_{i \in [n]} p_i(w)^2 \right).](https://s0.wp.com/latex.php?latex=%5Cvdash_%7B2%5Cell+%2B+2%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright%29%5E2+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i%5E2+%5Cright%29+%5Ccdot+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+p_i%28w%29%5E2+%5Cright%29.&bg=eeeeee&fg=666666&s=0&c=20201002)
This follows from our SoS Cauchy-Schwarz inequality by substituting
for
and
for
; we proved a fact about a sum of squares polynomial in
which implies a corresponding fact about a sum of squares in variables
and
. The latter in turn is a sum of squares in
.
To finish the first half of the
case, we just need to replace
with
on the right-hand side. By adding the polynomials
via the equations
, we obtain
![\mathcal{W} \vdash_{2\ell +2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^2 \leq \left ( \sum_{i \in [n]} w_i \right ) \cdot \left ( \sum_{i \in [n]} p_i(w)^2 \right ). \mathcal{W} \vdash_{2\ell +2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^2 \leq \left ( \sum_{i \in [n]} w_i \right ) \cdot \left ( \sum_{i \in [n]} p_i(w)^2 \right ).](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%5Cvdash_%7B2%5Cell+%2B2%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright+%29%5E2+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Cright+%29+%5Ccdot+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+p_i%28w%29%5E2+%5Cright+%29.&bg=eeeeee&fg=666666&s=0&c=20201002)
To establish the second inequality for the
base case, we start by again adding multiples of
to get
![\mathcal{W} \vdash_{2\ell +2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^2 = \left ( \sum_{i \in [n]} w_i^2 \cdot p_i(w) \right )^2. \mathcal{W} \vdash_{2\ell +2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^2 = \left ( \sum_{i \in [n]} w_i^2 \cdot p_i(w) \right )^2.](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%5Cvdash_%7B2%5Cell+%2B2%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright+%29%5E2+%3D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i%5E2+%5Ccdot+p_i%28w%29+%5Cright+%29%5E2.&bg=eeeeee&fg=666666&s=0&c=20201002)
Then the inequality follows from Cauchy-Schwarz and again adding some multiples of
.
Now it’s time for the induction step. We can assume
![\mathcal{W} \vdash_{O(t/2 \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t/2} \leq \left ( \sum_{i \in [n]} w_i \right )^{t/2-1} \cdot \left ( \sum_{i \in [n]} w_i p_i(w)^{t/2} \right ). \mathcal{W} \vdash_{O(t/2 \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t/2} \leq \left ( \sum_{i \in [n]} w_i \right )^{t/2-1} \cdot \left ( \sum_{i \in [n]} w_i p_i(w)^{t/2} \right ).](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%5Cvdash_%7BO%28t%2F2+%5Ccdot+%5Cell%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright+%29%5E%7Bt%2F2%7D+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Cright+%29%5E%7Bt%2F2-1%7D+%5Ccdot+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+p_i%28w%29%5E%7Bt%2F2%7D+%5Cright+%29.&bg=eeeeee&fg=666666&s=0&c=20201002)
By again adding multiples of
, we obtain
![\mathcal{W} \vdash_{O(t/2 \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t/2} \leq \left ( \sum_{i \in [n]} w_i^2 \right )^{t/2-1} \cdot \left ( \sum_{i \in [n]} w_i^2 p_i(w)^{t/2} \right. ) \mathcal{W} \vdash_{O(t/2 \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t/2} \leq \left ( \sum_{i \in [n]} w_i^2 \right )^{t/2-1} \cdot \left ( \sum_{i \in [n]} w_i^2 p_i(w)^{t/2} \right. )](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%5Cvdash_%7BO%28t%2F2+%5Ccdot+%5Cell%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright+%29%5E%7Bt%2F2%7D+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i%5E2+%5Cright+%29%5E%7Bt%2F2-1%7D+%5Ccdot+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i%5E2+p_i%28w%29%5E%7Bt%2F2%7D+%5Cright.+%29&bg=eeeeee&fg=666666&s=0&c=20201002)
Now both sides are sums of squares. So, by squaring, we find
![\mathcal{W} \vdash_{O(t \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t} \leq \left ( \sum_{i \in [n]} w_i^2 \right )^{t-2} \cdot \left ( \sum_{i \in [n]} w_i^2 \cdot p_i(w)^{t/2} \right )^2. \mathcal{W} \vdash_{O(t \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t} \leq \left ( \sum_{i \in [n]} w_i^2 \right )^{t-2} \cdot \left ( \sum_{i \in [n]} w_i^2 \cdot p_i(w)^{t/2} \right )^2.](https://s0.wp.com/latex.php?latex=%5Cmathcal%7BW%7D+%5Cvdash_%7BO%28t+%5Ccdot+%5Cell%29%7D+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i+%5Ccdot+p_i%28w%29+%5Cright+%29%5E%7Bt%7D+%5Cleq+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i%5E2+%5Cright+%29%5E%7Bt-2%7D+%5Ccdot+%5Cleft+%28+%5Csum_%7Bi+%5Cin+%5Bn%5D%7D+w_i%5E2+%5Ccdot+p_i%28w%29%5E%7Bt%2F2%7D+%5Cright+%29%5E2.&bg=eeeeee&fg=666666&s=0&c=20201002)
The proof is finished by applying Cauchy-Schwarz to the last term and cleaning up by adding multiples of
as necessary. QED.
Looking Ahead
In the next post, we will use Fact 2 to deduce an SoS version of Lemma 1 (from part 1). Subsequently, we will finish designing an algorithm for one-dimensional clustering, proving Theorem 1 (from part 1) in the one-dimensional case. Then we will get high dimensional.
-
Suspicious readers may note that our original Gaussian mixture model assumed that the population means are
-separated. Because we will draw enough samples that the empirical mean of each cluster is very close to the true (population) mean, this difference can be ignored for now and is easy to handle when we put everything together to analyze our final algorithm.
-
To make them look even more alike of course we could have introduced notations like
, but for concreteness we are keeping the
variables at least a little explicit.