In this series of three posts I want to discuss some recent and old advances in discrepancy theory and their applications to algorithms. Discrepancy minimization is quite a rich and beautiful area as evidenced in these two books. Here I will focus on a specific perspective — that of “Beating the Union Bound” — which while being limiting captures the main issues in the examples and applications we consider. The description below is particularly efficient in hiding the main combinatorial motivations for the questions in discrepancy theory, but hopefully we will makeup for it by other means.

As a preview of what’s to come, in the next post I will discuss Gluskin’s geometric approach to proving the `Six Standard Deviations Suffice’ theorem (Theorem 1 below, but with a constant different from ${6}$) and in the third post we will look at some algorithmic applications and approaches. Much of the content in the posts is based on discussions with Shachar Lovett and Oded Regev, but all mistakes are mine.

We all know what the union bound is. But if you don’t, here’s how the well-known mathematical genius S. Holmes described it: “When you have eliminated the impossible, whatever remains, however improbable, must be the truth”. In non-sleuth terms this says that if you have arbitrary events ${E_1,\ldots,E_m}$ over some probability space, then

$\displaystyle {Pr[E_1 \cup E_2 \cup \cdots \cup E_m] \leq Pr[E_1] + \cdots + Pr[E_m]}$.

In particular, if the latter sum is less than ${1}$, then there exists a sample point where none of the events ${E_1,\ldots,E_m}$ occur.

By now, we have seen several strangely simple and powerful applications of this simple precept – e.g., existence of good Ramsey graphs (Erdős), existence of good error correcting codes (Shannon), existence of good metric embeddings (Johnson-Lindenstrauss). And the union bound and further variants of it are one of the main techniques we have for showing existential results.

However, the union bound is indeed quite naive in many important contexts and is not always enough to get what we want (as nothing seems to be these days). One particularly striking example of this is the Lovász local lemma (LLL). But let us not digress along this (beautiful) path. Here we will discuss how “beating” the union bound can lead to some important results in the context of discrepancy theory.

Discrepancy and Beating the Union Bound: Six Suffice A basic probabilistic inequality which we use day-in and day-out is the Chernoff bound. The special case of interest to us is the following: for any vector ${a = (a_1,\ldots,a_n) \in \{1,-1\}^n}$ and ${t > 0}$,

$\displaystyle Pr_{\epsilon \sim \{1,-1\}^n}\left[\left|\sum_i \epsilon_i a_i\right| > t \sqrt{n}\right] \leq 2 \exp(-t^2/2). \ \ \ \ \ (1)$

In words, the probability that the sum ${\sum_i \epsilon_i a_i}$ falls ${t}$ standard deviations away is at most ${2\exp(-t^2/2)}$. Combining the above with a simple union bound we get the following: For vectors ${a^1,\ldots,a^n \in \{1,-1\}^n}$,

$\displaystyle Pr_{\epsilon \sim \{1,-1\}^n}\left[\exists j \in [n],\; \left|\sum_{i=1}^n \epsilon_i a^j_i\right| > t\sqrt{n}\right] \leq 2 n \exp(-t^2/2). \ \ \ \ \ (2)$

In particular, if we choose ${t = O(\sqrt{\log n})}$, then the right hand side above is less than ${1}$ and we get that there exists ${\epsilon \in \{1,-1\}^n}$ such that for every ${j \in [n]}$, ${|\langle a^j, \epsilon\rangle| \leq O(\sqrt{n \log n})}$. Can we do better? It is important to note that the above argument is actually tight for uniformly random signs and we want to do better by a careful choice of signs. This is exactly what Spencer showed in his seminal “Six Standard Deviatons Suffice” result (1985) (Spencer’s proof as well as all other proofs easily generalize to the case when there are more vectors than the degrees of freedom and we only require each vector to have bounded entries.):

Theorem 1 For all ${n \geq 1}$, vectors ${a^1,\ldots,a^n \in \{1,-1\}^n}$, there exists ${\epsilon \in \{1,-1\}^n}$ such that for every ${j \in [n]}$, ${|\langle a^j,\epsilon \rangle| \leq 6\sqrt{n}}$.

We will see a proof of the theorem in the next post.

Discrepancy and Beating the Union Bound (and some): Paving Conjecture As discussed in this post a few months back, the stunning result (adjective is mine) of Adam Marcus, Daniel Spielman and Nikhil Srivastava proving the paving conjecture (and hence resolving the Kadison-Singer problem) can also be cast in a discrepancy language. Let me repeat this from Nikhil’s post for completeness. Let us look at a specific variant of the ‘Matrix Chernoff Bound’ (see Nikhil’s post for references; here ${\|\;\|}$ denotes the spectral norm of a matrix):

Theorem 2 Given symmetric matrices ${A_1,\ldots,A_m \in \mathbb{R}^{n \times n}}$,

$\displaystyle Pr_{\epsilon \sim \{1,-1\}^m}\left[\left\|\sum_{i=1}^m \epsilon_i A_i\right\| \geq t \left\|\sum_{i=1}^m A_i^2\right\|^{1/2}\right] \leq 2 n \exp(-t^2/2).$

Note that the above is a substantial generalization of Equation 2 which corresponds to the special case when the matrices ${A_j}$‘s are diagonal matrices with entries given by the vectors ${a_j}$. In a very naive but still meaningful way, one can view the factor ${n}$ on the right hand side as again appearing partially because of a union bound.

An implication of the above theorem is that for symmetric matrices ${A_1,\ldots,A_m \in \mathbb{R}^{n \times n}}$, for uniformly random signs ${\epsilon \sim \{1,-1\}^m}$, with high probability

$\displaystyle \left\|\sum_{i=1}^m \epsilon_i A_i\right\| = O(\sqrt{\log n})\left\|\sum_{i=1}^m A_i^2\right\|^{1/2}. \ \ \ \ \ (3)$

Just as for the scalar case, the ${\sqrt{\log n}}$ factor is necessary in general for uniformly random signs. Can we do better? There are two facets to this question both of which seem to be quite important and basic (and perhaps correspondingly hard):

• Question 1: Can the ${\sqrt{\log n}}$ factor be improved by picking the signs carefully instead of uniformly at random?
• Question 2: Can the ${\sqrt{\log n}}$ factor be improved for uniformly random signs under some nice conditions on the matrices ${A_i}$‘s?

Here we will focus on the first question, but let me say a couple of words about the second one. In all known examples showing tightness of the matrix Chernoff bound, the matrices ${A_1,\ldots,A_m}$ seem to have a lot of commutative structure (e.g., diagonal matrices) and intuitively non-commutativity should help us in improving the bound. Perhaps there is a quantitative way of capturing non-commutativity to do better (see the discussion here for one such attempt).

Regarding the first question, Spencer’s theorem gives a positive answer in the special case when ${A_i}$‘s are diagonal matrices with ${\{1,-1\}}$ entries (or more generally, bounded entries). The breakthrough result of Marcus, Spielman and Srivastava amounts to giving an exact characterization of when we can do better if the matrices ${A_i}$‘s are rank one positive semi-definite matrices.

Discrepancy and Beating the Union: Matrix Spencer Theorem? The above discussions prompt the following question:

Conjecture For any symmetric matrices ${A_1,\ldots,A_n \in {\mathbb R}^{n \times n}}$ with ${\|A_i\| \leq 1}$, there exist signs ${\epsilon_1,\ldots,\epsilon_n}$ such that ${\|\sum_i \epsilon_i A_i\| = O(\sqrt{n})}$.

The above conjecture is a strong generalization of Spencer’s theorem which corresponds to the matrices ${A_i}$‘s being diagonal. The earliest reference I am aware of for it is this paper. I can’t think of any concrete applications of the conjecture, but I quite like it because of the simplicity of the statement. Admittedly, I also have a personal bias: my first foray into discrepancy theory (with Shachar Lovett and Oded Regev) was to study this question.

One can view the conjecture as giving a partial answer to Question 1. In the case when ${\|A_i\| \leq 1}$, ${\|\sum_i A_i^2\| \leq \sum_i \|A_i^2\| \leq n}$, so that the right hand side of Equation 3 is ${O(\sqrt{n \log n})}$ . Thus, the above conjecture beats the matrix Chernoff bound by getting rid of the ${\sqrt{\log n}}$ term, but instead introduces a term depending on the “sum-of-norms” of the ${A_i^2}$‘s as opposed to having a dependence on the “norm-of-the-sum” of ${A_i^2}$‘s.

In the next post, I will discuss Gluskin’s proof of Theorem 1, which for now (to me) seems to be the most promising approach for proving the conjecture.

Acknowledgments

Thanks to Shachar Lovett, Oded Regev and Nikhil Srivastava for helpful suggestions, comments, and corrections during the preparation of this post.

February 8, 2014 10:32 pm

Hi,
Isn’t there a typo in the statement of Theorem 2? (namely, $i$ ranges from $1$ to $m$, yet $\epsilon$ has only $n$ coordinates; $\epsilon_i$ can only have $i\in[n]$).

• February 9, 2014 4:12 am

Thanks; corrected the typo.

2. February 9, 2014 2:42 am

Thank you Raghu – looking forward to see the proof.
Should there be A_i^2 inside the norm of the conjecture? Also, is it a “sum of norms” or a “norm of sums”? I have to say that I don’t have good intuition as to when we should expect a bound with the latter form and when we cannot improve on the former.

• February 9, 2014 6:37 pm

The question is probably as interesting with A_i^2 inside the norm, but it probably shouldn’t make much of a difference in terms of its validity.

I don’t have good intuition either, but my guess would be that for sums of independent random matrices, in most cases the stronger ”norm of sums” bound will likely hold. For dependent sums, things may fail quite badly however.

3. February 10, 2014 6:11 pm

Hi Raghu,

I guess I can just wait and see, but I am curious: are you going for Gluskin’s proof (via Minkowski’s theorem), or the Giannopoulos version of the proof?

• February 10, 2014 6:32 pm

To kill the surprise … I’m planning to follow Gluskin’s original proof.

• February 10, 2014 8:27 pm

Looking forward!