Discrepancy and Beating the Union Bound
In this series of three posts I want to discuss some recent and old advances in discrepancy theory and their applications to algorithms. Discrepancy minimization is quite a rich and beautiful area as evidenced in these two books. Here I will focus on a specific perspective — that of “Beating the Union Bound” — which while being limiting captures the main issues in the examples and applications we consider. The description below is particularly efficient in hiding the main combinatorial motivations for the questions in discrepancy theory, but hopefully we will makeup for it by other means.
As a preview of what’s to come, in the next post I will discuss Gluskin’s geometric approach to proving the `Six Standard Deviations Suffice’ theorem (Theorem 1 below, but with a constant different from ) and in the third post we will look at some algorithmic applications and approaches. Much of the content in the posts is based on discussions with Shachar Lovett and Oded Regev, but all mistakes are mine.
We all know what the union bound is. But if you don’t, here’s how the well-known mathematical genius S. Holmes described it: “When you have eliminated the impossible, whatever remains, however improbable, must be the truth”. In non-sleuth terms this says that if you have arbitrary events over some probability space, then
In particular, if the latter sum is less than , then there exists a sample point where none of the events occur.
By now, we have seen several strangely simple and powerful applications of this simple precept – e.g., existence of good Ramsey graphs (Erdős), existence of good error correcting codes (Shannon), existence of good metric embeddings (Johnson-Lindenstrauss). And the union bound and further variants of it are one of the main techniques we have for showing existential results.
However, the union bound is indeed quite naive in many important contexts and is not always enough to get what we want (as nothing seems to be these days). One particularly striking example of this is the Lovász local lemma (LLL). But let us not digress along this (beautiful) path. Here we will discuss how “beating” the union bound can lead to some important results in the context of discrepancy theory.
Discrepancy and Beating the Union Bound: Six Suffice A basic probabilistic inequality which we use day-in and day-out is the Chernoff bound. The special case of interest to us is the following: for any vector and ,
In particular, if we choose , then the right hand side above is less than and we get that there exists such that for every , . Can we do better? It is important to note that the above argument is actually tight for uniformly random signs and we want to do better by a careful choice of signs. This is exactly what Spencer showed in his seminal “Six Standard Deviatons Suffice” result (1985) (Spencer’s proof as well as all other proofs easily generalize to the case when there are more vectors than the degrees of freedom and we only require each vector to have bounded entries.):
We will see a proof of the theorem in the next post.
Discrepancy and Beating the Union Bound (and some): Paving Conjecture As discussed in this post a few months back, the stunning result (adjective is mine) of Adam Marcus, Daniel Spielman and Nikhil Srivastava proving the paving conjecture (and hence resolving the Kadison-Singer problem) can also be cast in a discrepancy language. Let me repeat this from Nikhil’s post for completeness. Let us look at a specific variant of the ‘Matrix Chernoff Bound’ (see Nikhil’s post for references; here denotes the spectral norm of a matrix):
Theorem 2 Given symmetric matrices ,
Note that the above is a substantial generalization of Equation 2 which corresponds to the special case when the matrices ‘s are diagonal matrices with entries given by the vectors . In a very naive but still meaningful way, one can view the factor on the right hand side as again appearing partially because of a union bound.
Just as for the scalar case, the factor is necessary in general for uniformly random signs. Can we do better? There are two facets to this question both of which seem to be quite important and basic (and perhaps correspondingly hard):
- Question 1: Can the factor be improved by picking the signs carefully instead of uniformly at random?
- Question 2: Can the factor be improved for uniformly random signs under some nice conditions on the matrices ‘s?
Here we will focus on the first question, but let me say a couple of words about the second one. In all known examples showing tightness of the matrix Chernoff bound, the matrices seem to have a lot of commutative structure (e.g., diagonal matrices) and intuitively non-commutativity should help us in improving the bound. Perhaps there is a quantitative way of capturing non-commutativity to do better (see the discussion here for one such attempt).
Regarding the first question, Spencer’s theorem gives a positive answer in the special case when ‘s are diagonal matrices with entries (or more generally, bounded entries). The breakthrough result of Marcus, Spielman and Srivastava amounts to giving an exact characterization of when we can do better if the matrices ‘s are rank one positive semi-definite matrices.
Discrepancy and Beating the Union: Matrix Spencer Theorem? The above discussions prompt the following question:
Conjecture For any symmetric matrices with , there exist signs such that .
The above conjecture is a strong generalization of Spencer’s theorem which corresponds to the matrices ‘s being diagonal. The earliest reference I am aware of for it is this paper. I can’t think of any concrete applications of the conjecture, but I quite like it because of the simplicity of the statement. Admittedly, I also have a personal bias: my first foray into discrepancy theory (with Shachar Lovett and Oded Regev) was to study this question.
One can view the conjecture as giving a partial answer to Question 1. In the case when , , so that the right hand side of Equation 3 is . Thus, the above conjecture beats the matrix Chernoff bound by getting rid of the term, but instead introduces a term depending on the “sum-of-norms” of the ‘s as opposed to having a dependence on the “norm-of-the-sum” of ‘s.
In the next post, I will discuss Gluskin’s proof of Theorem 1, which for now (to me) seems to be the most promising approach for proving the conjecture.
Thanks to Shachar Lovett, Oded Regev and Nikhil Srivastava for helpful suggestions, comments, and corrections during the preparation of this post.