# Clustering and Sum of Squares Proofs, Part 2

This is part 2 of a series on clustering, Gaussian mixtures, and Sum of Squares (SoS) proofs. If you have not read it yet, I recommend starting with Part 1. Also, if you find errors please mention them in the comments (or otherwise get in touch with me) and I will fix them ASAP.

Welcome back.

In the last post, we introduced Gaussian mixture models and the clustering problem for Gaussian mixtures. We described identifiability proofs for unsupervised learning problems. Then we set ourselves some goals:

1. Design a simple identifiability proof for clustering in Gaussian mixtures, saying that if $X_1,\ldots,X_n$ are (enough) samples from a $d$-dimensional mixture of $k$ Gaussians, the ground-truth clustering of the $X_i$‘s by which Gaussian they were drawn from is identifiable from the samples.
2. Formalize the simplicity of that identifiability proof by showing that it is captured by a formal proof system of restricted power: the Sum of Squares (SoS) proof system.
3. Guided by our SoS identifiability proof, design an algorithm for clustering in Gaussian mixtures. (And hence prove Theorem 1 from last time.)

In the last post, we accomplished task (1) in $1$-dimensional case. In this post we will get started on task (2), again in the $1$-dimensional case.

Recalling from last time, in our identifiability proof we remembered only two things things about our samples $X_1,\ldots,X_n$:

(1) They break up into clusters $S_1,\ldots,S_k$ of equal size, $|S_i| = n/k = N$, such that for some $t \in \mathbb{N}$, each cluster obeys the empirical moment bound,

$\mathbb{E}_{j \sim S_i} |X_j - \mu_i|^t \leq 2 \cdot t^{t/2}$

where $\mu_i = \mathbb{E}_{j \sim S_i} X_j$ is the empirical mean of the cluster $S_i$, and

(2) Those means are separated: $|\mu_i - \mu_j| \geq \Delta$.1

The key tool we used in our identifiability proof was Fact 1, which we restate here for convenience.

Fact 1. Let $S,S' \subseteq \mathbb{R}$ have $|S| = |S'| = N$. Let $X$ denote a uniform sample from $S$ and similarly for $X'$. Let $\mu = \mathbb{E} X$ and $\mu' = \mathbb{E} X'$. Suppose $X,X'$ satisfy the $t$-th moment bound

$\mathbb{E} |X - \mu|^t \leq 2 \cdot t^{t/2} \text{ and } \mathbb{E}|X' - \mu'|^t \leq 2 \cdot t^{t/2}.$

Then

$|\mu - \mu'| \leq 4 \sqrt t \cdot \left ( \frac{|S \cap S'|}{N} \right )^{-1/t}.$

We are going to give a Sum of Squares proof of this fact. Or rather: we are going to state a very similar fact, which concerns inequalities among low-degree polynomials, and give a Sum of Squares proof of that.

We are going to do things in a slightly unusual order, delaying definition of the SoS proof system till we have something concrete in mind to prove in it.

First, because SoS is a proof system to reason about inequalities among low-degree polynomials, we are going to formulate Fact 2, which is like Fact 1 except it will be explicitly about low-degree polynomials. Proving Fact 2 will be our goal.

Second, we will define the SoS proof system.

Finally, we will prove this Fact 2 in the low-degree SoS proof system.

## Fact 2: an SoS version of Fact 1

Fact 1 concerns two subsets $S,S'$ of $\mathbb{R}$. When we used Fact 1 to prove Lemma 1 in part 1, one of those sets $S$ was one of the ground-truth clusters among $S_1,\ldots,S_k$, and one of them was a “candidate” cluster $S' \subseteq X_1,\ldots,X_n$ — Lemma 1 showed that the candidate cluster $S'$ must in fact have been close to one of the true clusters $S_1,\ldots,S_k$.

We will design a system of polynomial equations whose solutions are in correspondence with candidate clusters $S'$. This is probably unavoidably foreign at first. We will offer some attempts at demystifying remarks later on, but for now we will forge ahead.

Let $X_1,\ldots,X_n \in \mathbb{R}$. Let $w_1,\ldots,w_n$ be some indeterminates; they will be the variables in our polynomials. We are going to think of them as $0/1$ indicators of a subset $T \subseteq [n]$ which is a candidate cluster.

First, let’s enforce that $w$ is the indicator vector of a set of size $N = n/k$. Consider the equations

$w_i^2 = w_i \text{ for all } i \in [n] \text{ and } \sum_{i \in [n]} w_i = N$.

Any solution to these equations over $\mathbb{R}$ is a $0/1$ indicator of a subset of $[n]$ of size $N$.

The second hypothesis Fact 1 places on a candidate cluster is the $t$-th moment bound. Let’s enforce that with a polynomial inequality. First, we need some notation for the empirical mean $\mu$ of $T$. Denote by $\mu(w)$ the polynomial

$\mu(w) = \frac 1 N \sum_{i \in [n]} w_i X_i$.

Often we drop the $w$ and just write $\mu$. And now consider the inequality:

$\frac 1 N \sum_{i \in [n]} w_i (X_i - \mu)^t \leq 2 \cdot t^{t/2}$.

Belaboring the point somewhat: any solution over $\mathbb{R}$ to the equations and inequalities we have described would correspond to a subset of $[n]$ of which obeys the $t$-th empirical moment bound. (Here we assume that $t$ is even, so that $|X_i - \mu|^t = (X_i - \mu)^t$.)

Although we don’t have all the definitions we need in place yet, we will go ahead and state Fact 2. We introduce some suggestive notation. If $S \subseteq [n]$, we define the polynomial

$|S \cap T|(w) = \sum_{i \in S} w_i$.

Often we just write $|S \cap T|$.

Fact 2.  Let $X_1,\ldots,X_n \in \mathbb{R}$. Let $S \subseteq [n]$ have $|S| = N$; let $\mu_S = \mathbb{E}_{i \sim S} X_i$ be its mean. Let $t$ be a power of $2$. Suppose $S$ satisfies

$\mathbb{E}_{i \sim S} |X_i - \mu_S|^t \leq 2 \cdot t^{t/2}.$

Let $w_1,\ldots,w_n$ be indeterminates. Let $\mathcal{A}$ be the following set of equations and inequalities.

$w_i^2 = w_i \text{ for } i \in [n]$
$\sum_{i \in [n]} w_i = N$
$\frac 1 N \sum_{i \in [n]} w_i \cdot (X_i - \mu)^t \leq 2 \cdot t^{t/2}.$

Then

$\mathcal{A} \vdash_{O(t)} \left ( \frac{|S \cap T|}{N} \right )^t \cdot (\mu - \mu_S)^t \leq 2^{O(t)} \cdot t^{t/2} \cdot \left( \frac {|S \cap T|} N \right ) ^{t-1}.$

We have not yet defined the notation $\mathcal{A} \vdash_{O(t)} \ldots$. This notation means that that there is a degree $O(t)$ SoS proof of the inequality on the right-hand side using the axioms $\mathcal{A}$ — we will define this momentarily.

The purpose of stating Fact 2 now was just to convince the reader that there is a plausible version of Fact 1 which may be stated entirely as inequalities among polynomials. The reader is encouraged to compare the hypotheses and conclusions of Facts 1 and 2 (ignoring this $\mathcal{A} \vdash \ldots$ for now).2 There are two main differences:

(a) The inequality in the conclusion of Fact 2 is raised to the $t$-th power, as compared to the conclusion of Fact 1.

(b) The inequality in the conclusion of Fact 2 seems to have extra factors of $\frac{|S \cap T|}{N}$ on both sides.

Point (a) is needed just to make both sides of the inequality into polynomials in $w$; otherwise there would be fractional powers. The need for (b) is more subtle, and we will not be able to fully understand it for a while. For now, what we can say is: the inequality

$\left ( \frac{|S \cap T|}{N} \right ) \cdot (\mu - \mu_S)^t \leq 2^{O(t)} \cdot t^{t/2}.$

would be true for any $w$ which solves $\mathcal{A}$, but that might not have an SoS proof.

One last difference is the factor $2^{O(t)}$ on the right-hand side of the conclusion. This is probably not inherent; it arises because we will use an easy-to-prove but lossy version of the triangle inequality in our SoS proof. In any case, it is dwarfed by the term $t^{t/2}$, so we are not too worried about it.

Enough dancing around these SoS proofs — in order to stop with the hand-waiving, we need to set up our proof system.

## Sum of Squares Proofs

We cannot go any further without defining the formal proof system we will work in. Since for now we are sticking with a one-dimensional setting, things can be little simpler than for the proof system we will need when $X_1,\ldots,X_n$ become high-dimensional, but developing the proof system still incurs a little notational burden. That is life.

The Sum of Squares proof system, henceforth “SoS”, is a formal proof system for reasoning about systems of polynomial equations and inequalities over the real numbers. At its heart is the simple fact that if $p(x)$ is a polynomial in indeterminates $x$ with real coefficients, then $p(x)^2 \geq 0$ for all $x \in \mathbb{R}$.

Plenty of elementary expositions on SoS proofs are available (see e.g. SoS on the hypercube, SoS on general domains, and the first several sections of this paper by Ryan O’Donnell and Yuan Zhou.) We will define as much as we need to keep this tutorial self-contained but for expanded discussion we refer the reader to those resources and references therein.

Let $x_1,\ldots,x_n$ be some indeterminates. Let $\mathcal{P} = \{ p_1(x) \geq 0,\ldots,p_m(x) \geq 0\}$ be some polynomial inequalities in those indeterminates. If $r(x)$ is some other polynomial with real coefficients, it may be the case that for any $x$ satisfying $\mathcal{P}$, also $r(x) \geq 0$; we would say that $\mathcal{P}$ implies $r(x) \geq 0$.

The key concept for us will be that $\mathcal{P}$ implies $r(x) \geq 0$ with a sum of squares proof. We say that $\mathcal{P}$ SoS-proves $r(x) \geq 0$ if there exist polynomials $b_S(x)$ for $S \subseteq [m]$ such that

$r(x) = \sum_{S \subseteq [m]} b_S(x) \prod_{i \in S} p_i(x)$

and the polynomials $\{b_S\}$ are sums of squares—i.e. each of them has the form $\sum_{j} u_j(x)^2$ for some polynomials $u_j(x)$. Notice that if this equation holds, for any $x \in \mathbb{R}^n$ which satisfies the inequalities $\mathcal{P}$ the right-hand side must be nonnegative, so $r(x)$ is also nonnegative.

The polynomials $b_S$ form an SoS proof that $\mathcal{P}$ implies $r(x) \geq 0$. If $\deg \left [ b_S(x) \prod_{i \in S} p_i(x) \right ]$ are all at most $d \in \mathbb{N}$, we say that the proof has degree $d$, and we write

$\mathcal{P} \vdash_d r(x) \geq 0$.

When $\mathcal{P} = \emptyset$ we often just write $\vdash_d r(x) \geq 0$.

We commonly use a few shorthand notations.

• We write $\mathcal{P} \vdash r(x) \geq r'(x)$, by which we mean $\mathcal{P} \vdash r(x) - r'(x) \geq 0$.
• We include polynomial equations such as $x_i^2 = x_i$ in $\mathcal{P}$, by which we mean that $\mathcal{P}$ contains both $x_i^2 \leq x_i$ and $x_i^2 \geq x_i$.
• We write $\mathcal{P} \vdash r(x) = r'(x)$, by which we mean that both $\mathcal{P} \vdash r(x) \geq r'(x)$ and $\mathcal{P} \vdash r'(x) \geq r(x)$.

Although the definition suggests something static — there is a fixed collection of polynomials $b_S$ forming an SoS proof — in practice we treat SoS proofs as dynamic objects, building them line by line much as we would any other proof. We are going to see an example of this very soon when we prove Fact 2.

It is time to begin to make good on the promise from the last section that we would get substantial mileage out of proving identifiability of mixtures of Gaussians using only simple inequalities. While it will take us several sections to completely make good on our promise, we can begin by giving SoS versions of the triangle and Holder’s inequalities we used in our identifiability proof. We will prove one of these now to give the reader a sense of how such arguments go; since there is usually not much novelty in SoS proofs of such basic inequalities we will defer others till later.

We would like to emphasize that the following SoS proofs themselves have nothing to do with mixtures of Gaussians; instead they are part of a growing problem-independent toolkit of basic inequalities useful in designing SoS proofs of more interesting mathematical statements. The fact that one can have such a problem-independent toolkit is in part what makes the proofs-to-algorithms method so broadly useful.

### SoS triangle inequality

We will start with a fairly weak SoS triangle inequality, that will suffice for our needs.
Much more sophisticated versions of this inequality are possible which allow various norms and do not lose the factor $2^t$ we do here.

Fact: SoS triangle inequality.
Let $x,y$ be indeterminates. Let $t$ be a power of $2$. Then

$\vdash_t (a+b)^t \leq 2^{t-1} (a^t + b^t).$

To prove the SoS triangle inequality we will want the following basic observation about composability of SoS proofs. (This is really a special case of a more general composibility result.)

Proposition: squaring SoS proofs.
Suppose $\vdash_d p(x) \leq q(x)$ and $p,q$ are sums of squares. Then $\vdash_{2d} p(x)^2 \leq q(x)^2$.

Proof of proposition.
By hypothesis, $q(x) - p(x)$ is a sum of squares polynomial. Now,

$q(x)^2 - p(x)^2 = (q(x) - p(x))(q(x) + p(x))$

so it is a product of sum of squares polynomials, and hence itself a sum of squares.
QED.

Proof of SoS triangle inequality.
We start with the case $t=2$. In this case, we have $(a+b)^2 = a^2 + 2ab + b^2$. We claim that $\vdash_2 2ab \leq a^2 + b^2$, since the polynomial $a^2 + b^2 - 2ab = (a-b)^2$ is a square. Hence, we find

$\vdash_t (a+b)^2 = a^2 + 2ab + b^2 \leq 2(a^2 + b^2).$

Now to prove the general case, we proceed by induction. We may suppose

$\vdash_{t/2} (a+b)^{t/2 - 1} \leq 2^{t/2} (a^{t/2} + b^{t/2}).$

By the proposition, this implies $\vdash_t (a+b)^t \leq 2^{t-2} (a^{t/2} + b^{t/2})^2$. Now we can apply the base case again to $(a'+b')^2$ for $a' = a^{t/2}$ and $b' = b^{t/2}$ to complete the argument.
QED.

### SoS Holder’s inequality

Holder’s inequality poses a quandary for SoS proofs, because of the non-integral exponents in most $p$-norms (hence such norms do not naturally correspond to polynomials). Consequently, there are many SoS versions of Holder’s inequality in the literature, choosing various ways to handle this non-integrality. The version we present here will be most useful for our mixtures of Gaussians proof. We will address the non-integral powers issue by imposing polynomial inequalities requiring that some of the underlying variables be Boolean.

Since the proof of this SoS Holder’s inequality proceeds via a similar induction to the one we used for the SoS triangle inequality we just proved, we defer it to the end of this post.

SoS Holder’s inequality.
Let $w_1,\ldots,w_n$ be indeterminates and let $\mathcal{W}$ be the collection of equations

$\mathcal{W} = \{ w_i^2 - w_i = 0 \, : i \in [n] \}.$

Note that the only solutions to $\mathcal{W}$ are $\{0,1\}^n$.

Let $p_1(w),\ldots,p_n(w)$ be polynomials of degree at most $\ell$. Let $t$ be a power of $2$. Then

$\mathcal{W} \vdash_{O(t\ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^t \leq \left ( \sum_{i \in [n]} w_i \right )^{t-1} \cdot \sum_{i \in [n]} p_i(w)^t.$

and

$\mathcal{W} \vdash_{O(t\ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^t \leq \left ( \sum_{i \in [n]} w_i \right )^{t-1} \cdot \sum_{i \in [n]} w_i \cdot p_i(w)^t.$

### SoS Boolean Inequalities

We will also want one more SoS inequality for our proof of Fact 1. In linear programming relaxations of Boolean problems, it is common to replace an integrality constraint $x \in \{0,1\}$ with the linear inequalities $0 \leq x \leq 1$. SoS can derive the latter inequalities.

Fact: SoS Boolean Inequalities

$x^2 = x \vdash_2 0 \leq x \leq 1$.

Proof of Fact.

For the inequality $x \geq 0$, just use the axiom $x \geq x^2$. For the inequality $x \leq 1$, write

$1 - x = x - x^2 + (1-x)^2$.

QED.

## Proof of Fact 2

Without further ado, we will prove Fact 2 by lifting the proof of Fact 1 into the SoS proof system. Though slightly notationally cumbersome, the proof follows that of Fact 1 nearly line by line—the reader is encouraged to compare the two proofs.

Proof of Fact 2.
We write out what we want to bound in terms of $X_1,\ldots,X_n$, then apply Holder’s inequality and the triangle inequality.

$\left ( \sum_{i \in S} w_i \right )^t \cdot (\mu - \mu_S)^t = \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t.$

We deploy our SoS Holder’s inequality to obtain

$\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \leq \left ( \sum_{i \in S} w_i \right )^{t-1} \cdot \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ]^t.$

Next we can use our equations $w_i^2 - w_i$ to conclude that in fact

$\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t \leq \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in S} w_i^2 \left [ (\mu - X_i) - (\mu_S - X_i) \right ]^t.$

The polynomial $\left ( \sum_{i \in S} w_i^2 \right)^{t-1}$ is a sum of squares, as is $2^t(a^t + b^t) - (a+b)^t$ via our SoS triangle inequality; applying this with $a = (\mu - X_i)$ and $b = -(\mu_S - X_i)$ we obtain

$\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right ) ^t$

$\leq 2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in S} w_i^2 (\mu - X_i)^t + w_i^2 (\mu_S - X_i)^t.$

We can add the sum of squares $2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \notin S} w_i^2 (\mu - X_i)^t + w_i^2 (\mu_S - X_i)^t$ to obtain

$\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t$

$\leq 2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in [n]} w_i^2 (\mu - X_i)^t + w_i^2 (\mu_S - X_i)^t.$

Using the equations $w_i^2 - w_i$, which, as in the SoS Boolean inequalities fact can be used to prove $w_i^2 \leq 1$, we obtain

$\mathcal{A} \vdash_{O(t)} \left (\sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t$

$\leq 2^t \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot \sum_{i \in [n]} w_i (\mu - X_i)^t + (\mu_S - X_i)^t.$

Finally using $\mathbb{E}_{i \in S}(\mu_S - X_i)^t \leq 2 \cdot t^{t/2}$ and $\mathcal{A} \vdash_{O(t)} \sum_{i \in [n]} w_i (X_i - \mu)^t \leq 2 \cdot t^{t/2} \cdot N$, we get

$\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t$

$\leq 2^{O(t)} \cdot \left ( \sum_{i \in S} w_i^2 \right )^{t-1} \cdot t^{t/2} \cdot N.$

Last of all, using $w_i^2 - w_i = 0$ to simplify the term $\sum_{i \in S} w_i^2$,

$\mathcal{A} \vdash_{O(t)} \left ( \sum_{i \in S} w_i \left [ (\mu - X_i) - (\mu_S - X_i) \right ] \right )^t$

$\leq 2^{O(t)} \cdot \left ( \sum_{i \in S} w_i \right )^{t-1} \cdot N \cdot t^{t/2}.$

The fact follows by rearranging. QED.

## SoS proof of Holder’s inequality

The last thing we haven’t proved is our SoS Holder’s inequality. We will need an SoS Cauchy-Schwarz inequality to prove it.

SoS Cauchy-Schwarz.
Let $x_1,\ldots,x_n,y_1,\ldots,y_n$ be indeterminates. Then

$\vdash_2 \left ( \sum_{i \in [n]} x_i y_i \right )^2 \leq \left ( \sum_{i \in [n]} x_i^2 \right ) \left ( \sum_{i \in [n]} y_i^2 \right ).$

Proof of SoS Cauchy-Schwarz.
It is not hard to check that

$\left ( \sum_{i \in [n]} x_i^2 \right ) \left ( \sum_{i \in [n]} y_i^2 \right ) - \left ( \sum_{i \in [n]} x_i y_i \right )^2 = \sum_{i,j \in [n]} (x_i y_j - x_j y_i)^2$

which is a sum of squares. QED.

Proof of SoS Holder’s inequality.
We start with the case $t=2$. Using SoS Cauchy-Schwarz, we obtain

$\vdash_{2\ell + 2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right)^2 \leq \left ( \sum_{i \in [n]} w_i^2 \right) \cdot \left ( \sum_{i \in [n]} p_i(w)^2 \right).$

This follows from our SoS Cauchy-Schwarz inequality by substituting $w_i$ for $x_i$ and $p_i(w)$ for $y_i$; we proved a fact about a sum of squares polynomial in $x,y$ which implies a corresponding fact about a sum of squares in variables $w_i$ and $p_i(w)$. The latter in turn is a sum of squares in $w$.

To finish the first half of the $t=2$ case, we just need to replace $w_i^2$ with $w_i$ on the right-hand side. By adding the polynomials $(w_i^2 - w_i) \cdot \left ( - \sum_{i \in [n]} p_i(w)^2 \right )$ via the equations $\mathcal{W}$, we obtain

$\mathcal{W} \vdash_{2\ell +2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^2 \leq \left ( \sum_{i \in [n]} w_i \right ) \cdot \left ( \sum_{i \in [n]} p_i(w)^2 \right ).$

To establish the second inequality for the $t=2$ base case, we start by again adding multiples of $(w_i^2 - w_i)$ to get

$\mathcal{W} \vdash_{2\ell +2} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^2 = \left ( \sum_{i \in [n]} w_i^2 \cdot p_i(w) \right )^2.$

Then the inequality follows from Cauchy-Schwarz and again adding some multiples of $(w_i^2 - w_i)$.

Now it’s time for the induction step. We can assume

$\mathcal{W} \vdash_{O(t/2 \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t/2} \leq \left ( \sum_{i \in [n]} w_i \right )^{t/2-1} \cdot \left ( \sum_{i \in [n]} w_i p_i(w)^{t/2} \right ).$

By again adding multiples of $w_i^2 - w_i$, we obtain

$\mathcal{W} \vdash_{O(t/2 \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t/2} \leq \left ( \sum_{i \in [n]} w_i^2 \right )^{t/2-1} \cdot \left ( \sum_{i \in [n]} w_i^2 p_i(w)^{t/2} \right. )$

Now both sides are sums of squares. So, by squaring, we find

$\mathcal{W} \vdash_{O(t \cdot \ell)} \left ( \sum_{i \in [n]} w_i \cdot p_i(w) \right )^{t} \leq \left ( \sum_{i \in [n]} w_i^2 \right )^{t-2} \cdot \left ( \sum_{i \in [n]} w_i^2 \cdot p_i(w)^{t/2} \right )^2.$

The proof is finished by applying Cauchy-Schwarz to the last term and cleaning up by adding multiples of $w_i^2 - w_i$ as necessary. QED.

In the next post, we will use Fact 2 to deduce an SoS version of Lemma 1 (from part 1). Subsequently, we will finish designing an algorithm for one-dimensional clustering, proving Theorem 1 (from part 1) in the one-dimensional case. Then we will get high dimensional.

#### Footnotes

1. Suspicious readers may note that our original Gaussian mixture model assumed that the population means are $\Delta$-separated. Because we will draw enough samples that the empirical mean of each cluster is very close to the true (population) mean, this difference can be ignored for now and is easy to handle when we put everything together to analyze our final algorithm.

2. To make them look even more alike of course we could have introduced notations like $\mathbb{E}_{i \sim T} (X_i - \mu)^t$, but for concreteness we are keeping the $w$ variables at least a little explicit.

## 10 thoughts on “Clustering and Sum of Squares Proofs, Part 2”

1. samhop says:

EDITS circa 10am western: English spelling fix (thanks to Gautam Kamath).

2. Enjoying the series!

In the proof of the SoS triangle inequality, are you sure about the sentence “Now we can apply the base case again”? What about the missing 2 factor? Of course, if the statement of the fact is changed to include the factor of 2^{2t-1}, this little wrinkle goes away.

1. samhop says:

Great catch! I will update the statement of the fact.

1. samhop says:

I think it should be correct now — actually isn’t 2^(t-1) enough?

2. That’s right. And it nicely matches the base case (t=2) too.