The accepted papers list for FOCS 2014 is now posted online.

I am always amazed by the depth and breadth of works in the TCS community, and this FOCS is no exception. Whether you are a physicist interested in the possibility of general “area law” governing entanglement between different parts of systems, a geometer interested in Gromov’s topological notion of expanders, an optimization expert interested in the latest and greatest in interior point methods, a game theorist interested in Karlin’s longstanding conjecture on convergence of fictitious play, a complexity theorist interested in the latest efforts to separate the determinant from the permanent, or simply a dog owner or triangle lover, you will find something of interest in the conference. And of course FOCS is not just a collection of unrelated papers. A quantum computing expert would want to check the paper on topological expanders, as similar concepts have arose in the context of topological approaches to quantum error correction. An optimization expert might want to understand the convergence of “fictitious play” which is a very natural algorithm for solving linear programs, and of course since STOC 2014 we all know  that circuit lower bounds are tightly connected to improving the exponents of algorithms for combinatorial problems. This is just a taste and I could have chosen many other such examples, all strengthening Avi Wigderson’s point why we should all go to talks in areas other than our own.

I was also amazed by the effort reviewers and program committee members have put in the selection process. Conference reviewing sometimes get a bad reputation as being superficial. I did not find this to be the case at all. People have invested an amazing amount of work reading the papers, checking proofs, chasing down references, verifying technical points with the authors and other experts, and generally doing the best job they can to have an informed selection process and assemble the best program we can for the TCS community. I am sure we made mistakes, and the final program, as a product of a committee, is not fully consistent with any particular PC member’s taste, including my own. In particular, there were many submissions that some of us personally found novel and interesting, but were not included in the final program. But I do feel good about the process and believe that while some of our decisions may have been wrong, they were not wrong because we were superficial or lazy or cut corners due to the time pressure. Many times during this process I asked the PC members to go above and beyond what is typically expected, and they have more than risen to this challenge, often making heroic efforts to understand very complex (and sometimes not so greatly written) papers, and trying to get to the bottom of any misunderstanding. I am deeply grateful to them all.

Another instalment on my research-life stories.

—————

The Talmud says: “competition/envy among scholars increases wisdom” (kinat sofrim tarbe chochma). Good or bad, competition is here to stay. Nevertheless, one of the strengths of our community is in its collaborative nature. This is good for science, but in my eyes also makes our life so much better. A recent example is a research project with Guy Rothblum. For a few weeks, we met quite regularly and every meeting went more or less as follows: First, we would go over the solution from the previous meeting and find a bug. Then we would work together on a new and improved solution. This sounds frustrating (and would probably have been frustrating if I worked alone), but instead it was a great joy. We got to solve this problem again and again, and in the process enjoy each other’s creativity and company. Unfortunately, our current solution seems quite robust, so our fun ritual ended.

My best example for turning competition into collaboration is in my long-term collaboration with Salil Vadhan. It started when Ran Raz and I had a modest result on Randomness Extractors (following the breakthrough work of Luca Trevisan). We then learned that Salil had the same result, and already managed to write it down. Salil invited us to join (and I’m sure he was a bit sad to lose his first single-authored paper), on the other hand, Ran and I decided to decline and give up on the result altogether (and I was sad to lose a paper at this early stage of my career). In retrospect, losing that result would have been quite inconsequential, and similarly for Salil. But what did turn out to be extremely significant was what happened next. The three of us started collaborating together, leading to a stronger paper and then an additional collaboration, and before long Salil and I established not only a long-term research collaboration but also a great friendship. The unfortunate accident turned out to be most fortunate after all! Not all collaborations end up so fruitful, but I almost never regretted a collaboration (DBLP gives me 74 coauthors so this is a large sample). I hope that the set of collaborators that regret working with me is equally small.

So let’s all choose collaboration over competition and happily ride into the sunset. Right? Well, not so fast. Collaboration and competition are not mutually exclusive. Turns out, we cannot shut down our egos even when we enter a collaboration. While I strongly believe that the contributions to a collaboration cannot be attributed to any one of the contributors, we all like to feel that we contributed our fair share and that we demonstrated our worth (to others and more importantly to ourselves). An over-competitive collaboration can be destructive, but in moderation it could indeed be that competition among scholars does increase wisdom.

Coming back to the research-life stories project I intend to write a few (three that currently come to mind) more stories of my own, hoping that they will inspire more stories by others.

—————

My first research project progressed very quickly. A few months after I started working with Moni, I found myself writing my first paper (having only read very few papers before), and then preparing my first research talk (having attended very few research talks before). My first talk was at Weizmann, where I managed to utterly confuse some of the most brilliant scientists of our generation. During the talk I apologized and Adi Shamir said something to the effect of “it’s not you it’s us.” After the talk, Oded Goldreich made sure I will not be misled by Adi’s gallantry :-) Indeed, it wasn’t them – it was all me. I learnt some valuable lessons about research talks (for example that the intuitions that lead your research may be irrelevant and even confusing when presenting it). It was also the first of many opportunities for Moni to (try to) teach me to never apologize. Unfortunately, even that Moni is completely correct, the temptation is often just too strong for me to resist.

Soon after, I was getting ready for a trip to my first conference and a few seminar talks I managed to schedule. I was quite terrified. Afraid to mess up the talks, afraid to expose my ignorance and even anxious about the practical aspects of travel (it was only the third time I left Israel, and the first time I did it by myself).

On the night before my early morning flight, I get an email addressed to Moni, to me and to a large collection of dignitaries (for example, David Harel, whose only fault was being a past department chair). The email from Professor X went something like “I saw the abstract of your coming talk at MIT. You claim to give the first construction of Z. This is arguably a big fat lie as Professor Y and I already did it in our paper. The only honorable solution on your part is hara-kiri“. I was horrified. I didn’t know their paper and his claim could easily have been correct (it couldn’t have but I wasn’t fully aware at the time of Moni’s encyclopedic knowledge). It was too late to call Moni and I was afraid he will not see the email before my flight. My deck of slides was printed and my first talk was the morning after I land. Disgrace was imminent!

Fortunately, very soon after, Moni replied. I think that Professor X never answered Moni’s email and years later when he interviewed me for a faculty position he have shown no sign of remembering this incident. Professor Y accepted Moni’s explanation (in a private email if I remember correctly) but still managed to squeeze in a couple of rather aggressive questions during my conference talk. By then I was fortunately prepared and calm (and answered: “are you the first to do Z?” with an innocent “yes”).

My second talk during the trip was in MIT. Between the regulars and the visitors, my audience included half of my reference list. I was in awe. The talk was extremely vibrant, with many questions from the audience and if my memory serves, especially from Leonid Levin. I was ecstatic, and I did not mind at all that Adi and Oded (who were just starting a sabbatical in MIT) were taking on many of the questions directed at me. And why should I mind? Here are so many of my idols vividly debating my work! Who am I to disturb them? I only realized it might have been unusual when Michael Luby (giving a talk the day after) answered the first question he got with something like “unlike yesterday’s speaker, I’d like to talk more than five minutes … .“ Still, over dinner, Oded promoted the idea that it may be better if a talk is given by someone other than the authors. So I felt somewhat vindicated.

Throughout the years I gave many more talks, some praised, some scowled, and some both (sometimes even by the same person). Giving a bad talk can be painful (and when you are young it sometimes feels like the end of your career). I vividly remember how the criticism over my first practice job-talk paralyzing me for almost all of the time I had before the first interview (till Moni gave me a few simple comments that dramatically improved the talk). I am still beating myself for not customizing my ICM (International Congress of Mathematics) talk to the non-cs audience. On the other hand, giving a good talk can be quite empowering. One of the sweetest comments that I remember came from Avi Wigderson who told me after a survey talk on RL vs. L that I left the audience no choice but to work on the problem. Like many other things in life, the more you invest in preparing a talk the more you get out of it. While I enjoy giving talks a lot, it is very hard to recreate the rush that comes from overcoming fear in those early talks of my career. I doubt that I sufficiently appreciated this rush at the time (a Joni Mitchell song comes to mind).

The steering committee of the Conference on Computational Complexity has decided to become independent of IEEE. The Symposium of Computational Geometry is considering leaving ACM for similar reasons.

I completely understand the reasons, and applaud the steering committees in both cases for having a thoughtful, deliberate, and transparent process. Indeed, I have signed the letter of support for CCC. But, I am not happy about this outcome. I think that having our conferences under an umbrella such as ACM or IEEE that unities much of Computer Science is a positive thing, regardless of practical issues such as bank accounts, insurance, hotel deals etc.. (that thankfully I understand very little about) or even issues such as “prestige” and library subscriptions. I am afraid that an administrative isolation of a sub-area might end up contributing to an intellectual isolation as well. For example, while the idea might sound appealing, I think it is good that we don’t typically have “department of theoretical computer science” (let alone a department of computational complexity or computational geometry). It is important for us to interact with other computer scientists, if only so that we can occasionally torture them with an equation-filled colloquium talk :)

As I said in the past, I wish that our professional societies would behave more like their mission statements and less like for-profit publishers, and so conferences would not feel compelled to leave them. Given the hundreds of votes in the CCC and SoCG elections, I can’t help but think that if steering committees and chairs of various SIGs across all Computer Science collaborated together, they could marshal thousands of votes in the ACM elections that would truly change how it operates.

Update: Please see Paul Beame’s comment below for some of the significant differences between ACM and IEEE.

I have recently completed a survey “Codes with local decoding procedures” and will be giving a talk on this matter in August at the ICM. The survey covers two related families of codes with locality, namely, locally decodable codes that are broadly useful in theoretical computer science and local reconstruction codes that have recently been used in data storage applications to provide a more space efficient alternative to RAID. Below is the abstract of the survey:

Error correcting codes allow senders to add redundancy to messages, encoding bit strings representing messages into longer bit strings called codewords, in a way that the message can still be recovered even if a fraction of the codeword bits are corrupted. In certain settings however the receiver might not be interested in recovering all the message, but rather seek to quickly recover just a few coordinates of it. Codes that allow one to recover individual message coordinates extremely fast (locally), from accessing just a small number of carefully chosen coordinates of a corrupted codeword are said to admit a local decoding procedure. Such codes have recently played an important role in several areas of theoretical computer science and have also been used in practice to provide reliability in large distributed storage systems. We survey what is known about these codes.

I have just posted online a new survey “Sum-of-Squares proofs and the quest toward optimal algorithms” co-authored with David Steurer .

The survey discusses two topics I have blogged about before – Khot’s Unique Games Conjecture (UGC) and the Sum-of-Squares (SOS) method – and the connections between them. Both are related to the notion of meta algorithms. While typically we think of an algorithm as tailor-made for a particular problem, there are some generic approaches to design an “algorithmic scheme” or a meta algorithm, that can be applied to a large class of problems. The UGC predicts that a particular such meta-algorithm, which we call in the survey simply the “UGC Meta-Algorithm”, would give in fact optimal approximation guarantees among all efficient algorithms for a large class of problems. This is very exciting from the point of view of complexity theory, not simply because it gives many hardness-of-approximation results in one shot, but because in some sense it gives a complete understanding of what makes problems of certain domains easy or hard.

The SOS method is another meta algorithm that can be applied to the same cases. It is parameterized by a number $\ell$ called its degree. Using a higher degree can give potentially better approximation guarantees, at the expense of longer running time: running the method with degree $\ell$ on input of length $n$ takes $n^{O(\ell)}$ time. The UGC Meta-Algorithm in fact corresponds to the base case (which is degree $\ell=2$) of the SOS method, and so the UGC predicts that in a great many cases, using constant (even even mildly super-constant) degree larger than $2$ will not help get better approximation guarantees. We discuss in the survey a few recent results that, although falling short of refuting the UGC, cast some doubts on this prediction by showing that larger degree can sometimes help in somewhat similar contexts. I will give a talk on the same topic in the mathematical aspects of computer science section  of the 2014 International Congress of Mathematicians to be held in Seoul, South Korea, August 13-21, 2014. As you can see, there will be some great speakers in this session (and ICM in general), and I hope we will see blog posts on some of those surveys as well.

In this post, I am going to give you a simple, self-contained, and fruitful demonstration of a recently introduced proof technique called the method of interlacing families of polynomials, which was also mentioned in an earlier post. This method, which may be seen as an incarnation of the probabilistic method, is relevant in situations when you want to show that at least one matrix from some collection of matrices must have eigenvalues in a certain range. The basic methodology is to write down the characteristic polynomial of each matrix you care about, compute the average of all these polynomials (literally by averaging the coefficients), and then reason about the eigenvalues of the individual matrices by studying the zeros of the average polynomial. The relationship between the average polynomial and the individuals hinges on a phenomenon known as interlacing.

We will use the method to prove the following sharp variant of Bourgain and Tzafriri’s restricted invertibility theorem, which may be seen as a robust, quantitative version of the fact that every rank ${k}$ matrix contains a set of ${k}$ linearly independent columns.

Theorem 1 Suppose ${v_1,\ldots,v_m\in{\mathbb R}^n}$ are vectors with ${\sum_{i=1}^mv_iv_i^T=I_n}$. Then for every ${k there is a subset ${\sigma\subset [m]}$ of size ${k}$ with

$\displaystyle \lambda_k\left(\sum_{i\in\sigma}v_iv_i^T\right)\geq \left(1-\sqrt{\frac{k}{n}}\right)^2\frac{n}{m}.$

That is, any set of vectors ${v_1,\ldots,v_m}$ with variance ${\sum_{i=1}^m \langle x,v_i\rangle^2}$ equal to one in every direction ${\|x\|=1}$ must contain a large subset which is far from being linearly degenerate, in the sense of having large eigenvalues (compared to $(n/m)$, which is the average squared length of the vectors). Such variance one sets go by many other names in different contexts: they are also called isotropic sets, decompositions of the identity, and tight frames. This type of theorem was first proved by Bourgain and Tzafriri in 1987, and later generalized and sharpened to include the form stated here.

The original applications of Theorem 1 and its variants were mainly in Banach space theory and harmonic analysis. More recently, it was used in theoretical CS by Nikolov, Talwar, and Zhang in the contexts of differential privacy and discrepancy minimization. Another connection with TCS was discovered by Joel Tropp, who showed that the set ${\sigma}$ can be found algorithmically using a semidefinite program whose dual is related to the Goemans-Wiliamson SDP for Max-Cut.

In more concrete notation, the theorem says that every rectangular matrix ${B_{n\times m}}$ with ${BB^T=I_n}$ contains an ${n\times k}$ column submatrix ${S\subset [m]}$ with ${\sigma_k^2(B_S)\ge (1-\sqrt{k/n})^2(n/m)}$, where ${\sigma_k}$ is the kth largest singular value. Written this way, we see some similarity with the column subset selection problem in data mining, which seeks to extract a maximally nondegenerate set of representative’ columns from a data matrix. There are also useful generalizations of Theorem 1 for arbitrary rectangular ${B}$.

As I said earlier, the technique is based on studying the roots of averages of polynomials. In general, averaging polynomials coefficient-wise can do unpredictable things to the roots. For instance, the average of ${(x-1)(x-2)}$ and ${(x-3)(x-4)}$, which are both real-rooted quadratics, is ${x^2-5x+7}$, which has complex roots ${2.5\pm\sqrt{3}i}$. Even when the roots of the average are real, there is in general no simple relationship between the roots of two polynomials and the roots of their average.

The main insight is that there are nonetheless many situations where averaging the coefficients of polynomials also has the effect of averaging each of the roots individually, and that it is possible to identify and exploit these situations. To speak about this concretely, we will need to give the roots names. There is no canonical way to do this for arbitrary polynomials, whose roots are just sets of points in the complex plane. However, when all the roots are real there is a natural ordering given by the real line; we will use this ordering to label the roots of a real-rooted polynomial ${p(x)=\prod_{i=1}^n (x-\alpha_i)}$ in descending order ${\alpha_1\ge\ldots\ge\alpha_n}$.

Interlacing

We will use the following classical notion to characterize precisely the good situations mentioned above.

Definition 2 (Interlacing) Let ${f}$ be a degree ${n}$ polynomial with all real roots ${\{\alpha_i\}}$, and let ${g}$ be degree ${n}$ or ${n-1}$ with all real roots ${\{\beta_i\}}$ (ignoring ${\beta_n}$ in the degree ${n-1}$ case). We say that ${g}$ interlaces ${f}$ if their roots alternate, i.e.,

$\displaystyle \beta_n\le\alpha_n\le \beta_{n-1}\le \ldots \beta_1\le \alpha_1.$

Following Fisk, we denote this as ${g\longrightarrow f}$, to indicate that the largest root belongs to ${f}$.

If there is a single ${g}$ which interlaces a family of polynomials ${f_1,\ldots,f_m}$, we say that they have a common interlacing.

It is an easy exercise that ${f_1,\ldots,f_m}$ of degree ${n}$ have a common interlacing iff there are closed intervals ${I_n\le I_{n-1}\le\ldots I_1}$ (where ${\le}$ means to the left of) such that the ${i}$th roots of all the ${f_j}$ are contained in ${I_i}$. It is also easy to see that a set of polynomials has a common interlacing iff every pair of them has a common interlacing (this may be viewed as Helly’s theorem on the real line).

We now state the main theorem about averaging polynomials with common interlacings.

Theorem 3 Suppose ${f_1,\ldots,f_m}$ are real-rooted of degree ${n}$ with positive leading coefficients. Let ${\lambda_k(f_j)}$ denote the ${k^{th}}$ largest root of ${f_j}$ and let ${\mu}$ be any distribution on ${[m]}$. If ${f_1,\ldots,f_m}$ have a common interlacing, then for all ${k=1,\ldots,n}$

$\displaystyle \min_j \lambda_k(f_j)\le \lambda_k(\mathop{\mathbb E}_{j\sim \mu} f_j)\le \max_j \lambda_k(f_j).$

The proof of this theorem is a three line exercise. Since it is the crucial fact upon which the entire technique relies, I encourage you to find this proof for yourself (Hint: Apply the intermediate value theorem inside each interval ${I_i}$.) You can also look at the picture below, which shows what happens for two cubic polynomials with a quadratic common interlacing.

One of the nicest features of common interlacings is that their existence is equivalent to certain real-rootedness statements. Often, this characterization gives us a systematic way to argue that common interlacings exist, rather than having to rely on cleverness and pull them out of thin air. The following seems to have been discovered a number of times (for instance, Fell or Chudnovsky & Seymour); the proof of it included below assumes that the roots of a polynomial are continuous functions of its coefficients (which may be shown using elementary complex analysis).

Theorem 4 If ${f_1,\ldots,f_m}$ are degree ${n}$ polynomials and all of their convex combinations ${\sum_{i=1}^m \mu_if_i}$ have real roots, then they have a common interlacing.

Proof: Since common interlacing is a pairwise condition, it suffices to handle the case of two polynomials ${f_0}$ and ${f_1}$. Let

$\displaystyle f_t := (1-t)f_0+tf_1$

with ${t\in [0,1]}$. Assume without loss of generality that ${f_0}$ and ${f_1}$ have no common roots (if they do, divide them out and put them back in at the end). As ${t}$ varies from ${0}$ to ${1}$, the roots of ${f_t}$ define ${n}$ continuous curves in the complex plane ${C_1,\ldots,C_n}$, each beginning at a root of ${f_0}$ and ending at a root of ${f_1}$. By our assumption the curves must all lie in the real line. Observe that no curve can cross a root of either ${f_0}$ or ${f_1}$ in the middle: if ${f_t(r)=0}$ for some ${t \in (0,1)}$ and ${f_0(r)=0}$, then immediately we also have ${f_t(r)=tf_1(r)=0}$, contradicting the no common roots assumption. Thus, each curve defines a closed interval containing exactly one root of ${f_0}$ and one root of ${f_1}$, and these intervals do not overlap except possibly at their endpoints, establishing the existence of a common interlacing. $\Box$

Characteristic Polynomials and Rank One Updates

A very natural and relevant example of interlacing polynomials comes from matrices. Recall that the characteristic polynomial of a matrix ${A}$ is given by

$\displaystyle \chi(A)(x):=\det(xI-A)$

and that its roots are the eigenvalues of ${A}$. The following classical fact tells us that rank one updates create interlacing.

Lemma 5 (Cauchy’s Interlacing Theorem) If ${A}$ is a symmetric matrix and ${v}$ is a vector then

$\displaystyle \chi(A)\longrightarrow \chi(A+vv^T).$

Proof: There are many ways to prove this, and it is a nice exercise. One way which I particularly like, and which will be relevant for the rest of this post, is to observe that

$\displaystyle \begin{array}{rcl} \det(xI-A-vv^T) &= \det(xI-A)\det(I-vv^T(xI-A)^{-1})\qquad (*) \\&=\det(xI-A)(1-v^T(xI-A)^{-1}v) \\&=\chi(A)(x)\left(1-\sum_{i=1}^n\frac{\langle v,u_i\rangle^2}{x-\lambda_i}\right),\end{array}$

where ${u_i}$ and ${\lambda_i}$ are the eigenvectors and eigenvalues of ${A}$. Interlacing then follows by inspecting the poles and zeros of the rational function on the right hand side. $\Box$

We are now in a position to do something nontrivial. Suppose ${A}$ is a symmetric ${n\times n}$ real matrix and ${v_1,\ldots,v_m}$ are some vectors in ${{\mathbb R}^n}$. Cauchy’s theorem tells us that the polynomials

$\displaystyle \chi(A+v_1v_1^T),\chi(A+v_2v_2^T),\ldots,\chi(A+v_mv_m^T)$

have a common interlacing, namely ${\chi(A)}$. Thus, Theorem 3 implies that for every ${k}$, there exists a ${j}$ so that the ${k}$th largest eigenvalue of ${A+v_jv_j^T}$ is at least the ${k}$th largest root of the average polynomial

$\displaystyle \mathop{\mathbb E}_j \chi(A+v_jv_j^T).$

We can compute this polynomial using the calculation ${(*)}$ as follows:

$\displaystyle \mathop{\mathbb E} \chi(A+v_jv_j^T) = \chi(A)\left(1-\sum_{i=1}^n\frac{ \mathop{\mathbb E} \langle v_j,u_i\rangle^2}{x-\lambda_i}\right).$

In general, this polynomial depends on the squared inner products ${\langle v_j,u_i\rangle^2}$. When ${\sum_{i=1}^m v_iv_i^T=I}$, however, we have ${\mathop{\mathbb E}_j \langle v_j,u_i\rangle^2=1/m}$ for all ${u_i}$, and:

$\displaystyle \mathop{\mathbb E}_j \chi(A+v_jv_j^T) = \chi(A)(x)\left(1-\sum_{i=1}^n\frac{ 1/m}{x-\lambda_i}\right)=\chi(A)(x)-(1/m)\frac{\partial}{\partial x}\chi(A)(x).\qquad (**)$

That is, adding a random rank one matrix in the isotropic case corresponds to subtracting off a multiple of the derivative from the characteristic polynomial. Note that there is no dependence on the vectors ${v_j}$ in this expression, and it has forgotten’ all of the eigenvectors ${u_i}$. This is where the gain is: we have reduced a high-dimensional linear algebra problem (of finding a ${v_j}$ for which ${A+v_jv_j^T}$ has certain eigenvalues, which may be difficult when the matrices involved do not commute) to a univariate calculus / analysis problem (given a polynomial, figure out what subtracting the derivative does to its roots). Moreover, the latter problem is amenable to a completely different set of tools than the original eigenvalue problem.

As a sanity check, if we apply the above deduction to ${A=0}$, we find that any isotropic set ${\sum_{i=1}^m v_iv_i^T=I}$ must contain a ${j}$ such that ${\lambda_1(v_jv_j^T)}$ is at least the largest root of

$\displaystyle \chi(0)(x)-(1/m)\chi(0)'(x)=x^n-(n/m)x^{n-1},$

which is just ${n/m}$. This makes sense since ${\lambda_1(v_jv_j^T)=\|v_j\|^2}$, and the average squared length of the vectors is indeed ${n/m}$ since ${\mathrm{trace}(\sum_i v_iv_i^T)=n}$.

Differential Operators and Induction

The real power of the method comes from being able to apply it inductively to a sum of many independent random ${v_j}$‘s at once, rather than just once. In this case, establishing the necessary common interlacings requires a combination of Theorem 4 and Cauchy’s theorem. A central role is played by the differential operators ${(I-(1/m)\frac{\partial}{\partial x})}$ seen above, which I will henceforth denote as ${(I-(1/m)D)}$. The proof relies on the following key properties of these operators:

Lemma 6 (Properties of Differential Operators)

1. If ${ \mathbf{X} }$ is a random vector with ${\mathop{\mathbb E} \mathbf{X} \mathbf{X} ^T = cI}$ then

$\displaystyle \mathop{\mathbb E} \chi(A+ \mathbf{X} \mathbf{X} ^T) = (I-cD)\chi(A).$

2. If ${f}$ has real roots then so does ${(I-cD)f}$.
3. If ${f_1,\ldots,f_m}$ have a common interlacing, then so do ${(I-cD)f_1,\ldots, (1-cD)f_m}$.

Proof: Part (1) was essentially shown in ${(**)}$. Part (2) follows by applying ${(*)}$ to the matrix ${A}$ with diagonal entries equal to the roots of ${f}$, and plugging in ${v=\sqrt{c}\cdot(1,1,\ldots,1)^T}$, so that ${f=\chi[A]}$ and ${(1-cD)f=\chi[A+vv^T]}$.

For part (3), Theorem 3 tells us that all convex combinations ${\sum_{i=1}^m\mu_if_i}$ have real roots. By part (2) it follows that all

$\displaystyle (1-cD)\sum_{i=1}^m\mu_if_i = \sum_{i=1}^m \mu_i (1-cD)f_i$

also have real roots. By Theorem 4, this means that the ${(1-cD)f_i}$ must have a common interlacing.$\Box$

We are now ready to perform the main induction which will give us the proof of Theorem 1.

Lemma 7 Let ${ \mathbf{X} }$ be uniformly chosen from ${\{v_i\}_{i\le m}}$ so that ${\mathop{\mathbb E} \mathbf{X} \mathbf{X} ^T=(1/m)I}$, and let ${ \mathbf{X} _1,\ldots, \mathbf{X} _k}$ be i.i.d. copies of ${ \mathbf{X} }$. Then there exists a choice of indices ${j_1,\ldots j_k}$ satisfying

$\displaystyle \lambda_k \left(\chi(\sum_{i=1}^k v_{j_i}v_{j_i}^T)\right) \ge \lambda_k \left(\mathop{\mathbb E} \chi(\sum_{i=1}^k \mathbf{X} _i \mathbf{X} _i^T)\right).$

Proof: For any partial assignment ${j_1,\ldots,j_\ell}$ of the indices, consider the conditional expectation’ polynomial:

$\displaystyle q_{j_1,\ldots,j_\ell}(x) := \mathop{\mathbb E}_{ \mathbf{X} _{\ell+1},\ldots, \mathbf{X} _k} \chi\left(\sum_{i=1}^\ell v_{j_i}v_{j_i}^T + \sum_{i=\ell+1}^k \mathbf{X} _i \mathbf{X} _i^T\right).$

We will show that there exists a ${j_{\ell+1}\in [m]}$ such that

$\displaystyle \lambda_k(q_{j_1,\ldots,j_{\ell+1}})\ge \lambda_k(q_{j_1,\ldots,j_\ell}),\ \ \ \ \ (1)$

which by induction will complete the proof. Consider the matrix

$\displaystyle A = \sum_{i=1}^\ell v_{j_i}v_{j_i}^T,$

By Cauchy’s interlacing theorem ${\chi(A)}$ interlaces ${\chi(A+v_{j_{\ell+1}}v_{j_{\ell+1}}^T)}$ for every ${j_{\ell+1}\in [m]}$. Lemma 6 tells us ${(1-(1/m)D)}$ operators preserve common interlacing, so the polynomials

$\displaystyle (1-(1/m)D)^{k-(\ell+1)}\chi(A+v_{j_{\ell+1}}v_{j_{\ell+1}}^T) = q_{j_1,\ldots,j_\ell,j_{\ell+1}}(x)$

(by applying Lemma 6 ${k-(\ell+1)}$ times) must also have a common interlacing. Thus, some ${j_{\ell+1}\in [m]}$ must satisfy (1), as desired. $\Box$

Bounding the Roots: Laguerre Polynomials

To finish the proof of Theorem 1, it suffices by Lemma 7 to prove a lower bound on the ${k}$th largest root of the expected polynomial ${\mathop{\mathbb E} \chi\left(\sum_{i=1}^k \mathbf{X} _i \mathbf{X} _i^T\right)}$. By applying Lemma 6 ${k}$ times to ${\chi(0)=x^n}$, we find that

$\displaystyle \mathop{\mathbb E} \chi\left(m\cdot \sum_{i=1}^k \mathbf{X} _i \mathbf{X} _i^T\right) = (1-D)^kx^n =: p_k(x).\ \ \ \ \ (2)$

This looks like a nice polynomial, and we are free to use any method we like to bound its roots.

The easiest way is to observe that

$\displaystyle p_k(x)=x^{n-k}\L_k^{(n-k)}(x),$

where ${\L_k^{(n-k)}(x)}$ is a degree ${k}$ associated Laguerre polynomial. These are a classical family of orthogonal polynomials and a lot is known about the locations of their roots; in particular, there is the following estimate due to Krasikov.

Lemma 8 (Roots of Laguerre Polynomials) The roots of the associated Laguerre polynomial

$\displaystyle \L_k^{(n-k)}(x):= (1-D)^{n}x^k\ \ \ \ \ (3)$

are contained in the interval ${[n(1-\sqrt{k/n})^2,n(1+\sqrt{k/n})^2].}$

It follows by Lemma 8 that ${\lambda_k(p_k)\ge \lambda_k(\L_k^{(n-k)})\ge n(1-\sqrt{k/n})^2}$, which immediately yields Theorem 1 by Lemma 7 and (2).

If you think that appealing to Laguerre polynomials was magical, it is also possible to prove the bound (3) from scratch in less than a page using the barrier function’ argument from this paper, which is also intimately related to the formulas ${(*)}$ and ${(**)}$.

Conclusion

The argument given here is a special case of a more general principle: that expected characteristic polynomials of certain random matrices can be expressed in terms of differential operators, which can then be used to establish the existence of the necessary common interlacings as well as to analyze the roots of the expected polynomials themselves. In the isotropic case of Bourgain-Tzafriri presented here, all of these objects can be chosen to be univariate polynomials. Morally, this is because the covariance matrices of all of the random vectors involved are multiples of the identity (which trivially commute with each other), and all of the characteristic polynomials involved are simple univariate linear transformations of each other (such a ${(I-cD)}$). The above argument can also be made to work in the non-isotropic case, yielding improvements over previously known bounds. This is the subject of a paper in preparation, Interlacing Families III, with Adam Marcus and Dan Spielman.

On the other hand, the proofs of Kadison-Singer and existence of Ramanujan graphs involve analyzing sums of independent rank one matrices which come from non-identically distributed distributions whose covariance matrices do not commute. At a high level, this is what creates the need to consider multivariate characteristic polynomials and differential operators, which are then analyzed using techniques from the theory of real stable polynomials.

Acknowledgments

Everything in this post is joint work with Adam Marcus and Dan Spielman. Thanks to Raghu Meka for helpful comments in the preparation of this post.

UPDATE: In response to Olaf’s comment below, here is how to see that the bound in the theorem is sharp.

The tight example is provided by random matrices. Let ${G}$ be an ${n\times m}$ random matrix with i.i.d. ${N(0,1/m)}$ Gaussian random entries, so that ${\mathbb{E} A=\mathbb{E} GG^T = I_n}$. Then ${A}$ is called a Wishart matrix and its spectrum is very well-understood. In particular, we will use the following two facts:

(1) If $m,n\rightarrow\infty$ with ${m/n\rightarrow\infty}$ then the ratio

$\displaystyle \lambda_{1}(A)/\lambda_n(A)\rightarrow 1$

almost surely. Thus, if we take the ${v_i}$ to be the columns of such a ${G}$ then

$\displaystyle \sum_{i=1}^m v_iv_i^T=I(1+o(1))$

almost surely.

(2)If ${m,n\rightarrow\infty}$ with ${n/m=a}$ fixed, the spectrum of ${A}$ converges to a known distribution called the Marchenko-Pastur law, which is supported on the interval ${[(1-\sqrt{a})^2,(1+\sqrt{a})^2]}$. The eigenvalues are extremely (better than exponentially) unlikely to be supported on any interval smaller than this; in particular for every ${\epsilon>0}$ we have

$\displaystyle \mathbb{P} [\lambda_{n}(A)>(1-\sqrt{a})^2+\epsilon]<\exp(-c(\epsilon)n^2)$

for sufficiently large ${n}$. This is established in Hiai-Petz  (Theorem 8, thanks to S. Szarek for this crucial reference.)

Fact (2) implies that every ${n\times k}$ submatrix ${S}$ of ${G}$, $k < n$, which is also Gaussian, has

$\displaystyle \lambda_k(SS^T)<\left((1-\sqrt{a})^2+\epsilon\right)\frac{n}{m}\quad (+)$

with probability $1-{\exp(-ck^2)}$ for sufficiently large ${n}$ (keeping ${k/n}$ fixed). There are ${\binom{m}{k}=\exp(c'k\log(m/k))}$ such submatrices ${S}$, so if we set $m=n^{3/2}$ (say) then $k^2 >> k\log(m/k)$ for any $k=\Omega(n)$, and by a union bound $(+)$ holds simultaneously for all $n \times k$ submatrices of G, showing that the guarantee of the theorem cannot be improved for sufficiently large $n$.

It would be nice to have an argument for finite $n$, which would require a non-asymptotic version of the Hiai-Petz bound.