STOC 2016 just ended and it included many great results with one highlight of course being Laci Babai’s quasipolynomial time algorithm for graph isomorphism. But today I wanted to mention another paper that I found quite interesting and reminded me of the famous Tolstoy quote that

Happy families are all alike; every unhappy family is unhappy in its own way.

I am talking about the work Reed-Muller Codes Achieve Capacity on Erasure Channels by Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Sasoglu and Rudiger Urbanke. We are used to thinking of some error correcting codes as being “better” than others in the sense that they have fewer decoding errors. But it turns out that in some sense all codes of a given rate have the same average number of errors. The only difference is that “bad” codes (such as the repetition code), have a fairly “smooth” error profile in the sense that the probability of decoding success decays essentially like a low degree polynomial with the fraction of errors, while for “good” codes the decay is like a step function, where one can succeed with probability $1$ when the error is smaller than some $\tau$ but this probability quickly decays to half when the error passes $\tau$.

Specifically, if $C\subseteq GF(2)^n$ is a linear code of dimension $k$ and $p \in [0,1]$, we let $Y$ be the random variable over $GF(2)^n$ that is obtained by sampling a random codeword $X$ in $C$ and erasing (i.e., replacing it with $\star$) every coordinate $i$ independently with probability $p$. Then we define $h_C(p)$ to be the average over all $i$ of the conditional entropy of $X_i$ given $Y_{-i}=(Y_1,\ldots,Y_{i-1},Y_{i+1},\ldots,Y_n)$. Note that for linear codes, the coordinate $X_i$ is either completely fixed by $Y_{-i}$ or it is a completely uniform bit, and hence $h_C(p)$ can be thought of as the expected number of the coordinates that we won’t be able to decode with probability better than half from a $1-p$-sized random subset of the remaining coordinates.

One formalization of this notion that all codes have the same average number of errors is known as the Area Law for EXIT functions which states that for every code $C$ of dimension $k$, the integral $\int_0^1 h_C(p) dp$ is a fixed constant independent of $C$. In particular note that if $C$ is the simple “repetition code” where we simply repeat every symbol $n/k$ times, then the probability we can’t decode some coordinate from the remaining ones (in which case the entropy is one) is exactly $p^{n/k-1}$ where $p$ is the erasure probability. Hence in this case we can easily compute the integral $\int_0^1 h_C(p)dp = \int_0^1 p^{n/k-1}dp = k/n$ which is simply one minus the rate of the code. In particular this tells us that the average entropy is always equal to the rate of the code. A code is said to be capacity achieving if there is some function $\epsilon=\epsilon(n)$ that goes to zero with $n$ such that $h_C(p)<\epsilon$ whenever $p< 1-k/n-\epsilon$. The area law immediately implies that in this case it must be that $h_C(p)$ is close to one when $p>1-k/n$ (since otherwise the total integral would be smaller than $1-k/n$), and hence a code is capacity achieving if and only if the function $h_C(p)$ has a threshold behavior. (See figure below).

The paper above uses this observation to show that the Reed Muller code is capacity achieving for this binary erasure channel. The only property they use is the symmetry of this code which means that for this code we might as well have defined $h_C(p)$ with some fixed coordinate (e.g., the first one). In this case, using linearity, we can see that for every erasure pattern $\alpha$ on the coordinates $2,\ldots, n$ the entropy of $X_1$ given $Y_2,\ldots,Y_n$ is a Boolean monotone function of $\alpha$. (Booleanity follows because in a linear subspace the entropy of the remaining coordinate is either zero or one; monotonicity follows because in the erasure channel erasing more coordinates cannot help you decode.) One can then use the papers of Friedgut or Friedgut-Kalai to establish such a property. (The Reed-Muller code has an additional stronger property of double transitivity which allows to deduce that one can decode not just most coordinates but all coordinates with high probability when the fraction of errors is a smaller than the capacity.)

How do you prove this area law? The idea is simple. Because of linearity, we can think of the following setting: suppose we have the all zero codeword and we permute its coordinates randomly and reveal the first $t=(1-p)n$ of them. Then the probability that the $t+1^{th}$ coordinate is determined to be zero as well is $1-H_C(p)$. Another way to say it is that if we permute the columns of the $n\times k$ generating matrix $A$ of $C$ randomly, then the probability that the $t+1^{th}$ column is independent from the first $t$ columns is $H_C(1-t/n)$. In other words, if we keep track of the rank of the first $t$ columns, then at step $t$ the probability that the rank will increase by one is $H_C(1-t/n)$, but since we know that the rank of all $n$ columns is $k$, it follows that $\sum_{t=1}^n H_C(1-t/n) = n \int_0^1 H_C(p)dp = k$, which is what we wanted to prove. QED

p.s. Thanks to Yuval Wigderson, whose senior thesis is a good source for these questions.

On October 5-8 (right before FOCS 2016) there will be a workshop in Princeton in honor of Avi Wigderson’s 60th birthday. Avi is one of the most productive, generous and collaborative researchers in our community (see mosiac below of his collaborators). So, it’s not surprising that we were able to get a great lineup of speakers to what promises to be a fantastic workshop on Computer Science, Mathematics, and their interactions.

Attendance is free but registration is required. Also there are funds for travel support for students for which you should apply before August 1st.

Confirmed speakers are:

• Scott Aaronson – MIT
• Dorit Aharonov – Hebrew University of Jerusalem
• Noga Alon – Tel Aviv University
• Zeev Dvir – Princeton University
• Oded Goldreich – Weizmann Institute of Science
• Shafi Goldwasser – MIT
Weizmann Institute of Science
• Russell Impagliazzo –
University of California, San Diego
• Gil Kalai – Hebrew University of Jerusalem
• Richard Karp – University of California, Berkeley
• Nati Linial – Hebrew University of Jerusalem
• Richard Lipton – Georgia Institute of Technology
• László Lovász – Eötvös Loránd University
• Alex Lubotzky – Hebrew University of Jerusalem
• Silvio Micali – MIT
• Noam Nisan – Hebrew University of Jerusalem
• Toniann Pitassi – University of Toronto
• Alexander Razborov – University of Chicago
• Omer Reingold – Samsung Research America
• Michael Saks – Rutgers University
• Ronen Shaltiel – University of Haifa
• Madhu Sudan – Harvard University
• Eyal Wigderson – Hebrew University of Jerusalem

Today we witnessed an exciting moment for privacy, CS theory, and many friends and contributors of this blog. The definition of differential privacy, first articulated in a TCC paper just 10 short years ago, became a top-level feature of iOS, announced today at the Apple keynote address. Check this out for yourself:

You may be intrigued as I am by the bold claim by Prof. Aaron Roth, who said that

Incorporating differential privacy broadly into Apple’s technology is visionary, and positions Apple as the clear privacy leader among technology companies today.

Learning more about the underlying technology would benefit the research community and assure the public of validity of these statements. (We, at Research at Google, are trying to adhere to the highest standards of transparency by releasing Chrome’s front-end and back-end for differentially private telemetry.)

I am confident this moment will come. For now, our heartfelt congratulations to everyone, inside and outside Apple, whose work made today’s announcement possible!

By Boaz Barak and Omer Reingold

Yesterday Hillary Clinton became the first woman to be (presumptively) nominated for president by a major party. But in the eyes of many, the Republican Party was first to make history this election season by breaking the “qualifications ceiling” (or perhaps floor) in their own (presumptive) nomination.

Though already predicted in 2000 by the Simpsons , the possibility of a Trump presidency has rattled enough people so that even mostly technical bloggers such as Terry Tao and Scott Aaronson felt compelled to voice their opinion.

We too have been itching for a while to weigh in and share our opinions and to use every tool in our disposal for that, including this blog. We certainly think it’s very appropriate for scientists to be involved citizens and speak up about their views. But though we debated it, we felt that this being a group (technical) blog, it’s best not to wage into politics (as long as it doesn’t directly touch on issues related to computer science such as the Apple vs. FBI case). Hence we will refrain from future postings about the presidential election. For full disclosure, both of us personally support Hillary Clinton and have been donating to her campaign.

The last few weeks have seen amazing results in additive combinatorics, where following a breakthrough by Croot, Lev and Pach, several longstanding open questions have been resolved using short simple proofs. I haven’t been following this progress, but fortunately Bobby Kleinberg gave an excellent talk yesterday in our reading group about some of these works, and their relations to TCS questions such as approaches for fast matrix multiplication. Since the proofs are short and these results also been extensively blogged about, what follows is mainly for my own benefit.

Among other things, Bobby showed the proof of the following result, that demonstrates much of those ideas:

Theorem: (Ellenberg and Gijswijt, building on Croot-Lev-Pach) There exists an absolute constant $\epsilon>0$ such that for every $A \subseteq {\mathbb{F}}_3^n$, if $|A|> (3-\epsilon)^n$ then $A$ contains a 3-term arithmetic progression.

To put this in perspective, up till a few weeks ago, the best bounds were of the form $3^n/n^{1+\epsilon}$ and were shown using fairly complicated proofs, and it was very reasonable to assume that a bound of the form $3^{n-o(n)}$ is the best we can do. Indeed, an old construction of Behrend shows that this is the case in other groups such as the integers modulo some large $N$ or ${\mathbb{F}}_p^n$ where $p$ is some large value depending on $n$. The proof generalizes to ${\mathbb{F}}_q$ for every constant prime $q$ (and for composite order cyclic groups as well).

The proof is extremely simple. It seems to me that it can be summarized to two observations:

• Due to the algebraic structure of the problem, one can “interpolate” in some sense a $3$-a.p. free set $A$ with a polynomial of degree that is a half as small than you would expect otherwise.

• Due to concentration of measure phenomena in finite fields, this constant multiplicative factor makes a huge difference. There are values $d$ for which polynomials of degree $d$ make up all but an exponentially small fraction of the functions from ${\mathbb{F}}_3^n$ to ${\mathbb{F}}$, while polynomials of degree $d/2$ only constitute an exponentially small fraction of these functions.

Let’s now show the proof. Assume towards a contradiction that $A\subseteq {\mathbb{F}}_3$ satisfies $|A|\geq (3-\epsilon)^n$ ($\epsilon$ can be some sufficiently small constant, $\epsilon=0.1$ will do) but there do not exist three distinct points $a,c,b \in A$ that form a $3$-a.p. (i.e., such that $c-a = b-c$ or, equivalently, $a+b = 2c$).

Let $L(d)$ be the number of $n$-variate monomials over ${\mathbb{F}}_3$ where each variable has individual degree at most $2$ (higher degree can be ignored modulo $3$) and the total degree is at most $n$. Note that there are $3^n$ possible monomials where each degree is at most two, and their degree ranges from $0$ to $2n$, where by concentration of measure most of them have degree roughly $n$. Indeed, using the Chernoff bound we can see that if $\epsilon>0$ is a sufficiently small constant, we can pick some $\delta>0$ such that if $d = (2-\delta)n$ then $L(d) \geq 3^n - 0.1(3-\epsilon)^n$ but $L(d/2) \leq 0.1(3-\epsilon)^n$ (to get optimal results, one sets $d$ to be roughly $4n/3$ and derives $\epsilon$ from this value).

Now, if we choose $d$ in that manner, then we can find a polynomial $P:{\mathbb{F}}_3^n\rightarrow{\mathbb{F}}$ of degree at most $d$ that vanishes on $\overline{A} = {\mathbb{F}}_3^n \setminus A$ but is non zero on at least $|A|/2$ points. Indeed, finding such a polynomial amounts to solving a set of $3^n - |A|/2$ linear equations in $L(d)\geq 3^n - 0.1|A|$ variables.1 Define the $|A|\times |A|$ matrix $M$ such that $M_{a,b}= P((a+b)/2)$. Since the assumption that $|A| \geq (3-\epsilon)^n$ implies that $L(d/2) \leq 0.1|A|$, the theorem follows immediately from the following two claims:

Claim 1: ${\mathrm{rank}}(M) \geq |A|/2$.

Claim 2: ${\mathrm{rank}}(M) \leq 2L(d/2)$.

Claim 1 is fairly immediate. Since $A$ is $3$-a.p. free, for every $a\neq b$, $(a+b)/2$ is not in $A$ and hence $M$ is zeros on all the off diagonal elements. On the other hand, by the way we chose it, $M$ has at least $|A|/2$ nonzeroes on the diagonal.

For Claim 2, we expand $P((a+b)/2)$ as a polynomial of degree $d$ in the two variables $a$ and $b$, and write $M=M'+M''$ where $M'$ corresponds to the part of this polynomial where the degree in $a$ is at most $d/2$ and $M''$ corresponds to the part where the degree in $a$ is larger and hence the degree in $b$ is at most $d/2$. We claim that both ${\mathrm{rank}}(M')$ and ${\mathrm{rank}}(M'')$ are at most $L(d/2)$. Indeed, we can write $M'_{a,b}$ as $\sum_{\alpha} C_{\alpha} a^{\alpha} P_\alpha(b)$ for some coefficients $C_\alpha\in {\mathbb{F}}_3$ and polynomials $P_\alpha$, where $\alpha$ indexes the $L(d/2)$ monomials in $a$ of degree at most $d/2$. But this shows that $M'$ is a sum of at most $L(d/2)$ rank one matrices and hence ${\mathrm{rank}}(M')\leq L(d/2)$. The same reasoning shows that ${\mathrm{rank}}(M'') \leq L(d/2)$ thus completing the proof of Claim 2 and the theorem itself.

1. More formally, we can argue that the set of degree $d$ polynomials that vanish on $\overline{A}$ has dimension at least $L(d)-|\overline{A}| \geq 0.9|A|$ and hence it contains a polynomial with at least this number of nonzero values.

Actually, not really…

Northwestern University held a workshop on semidefinite programming hierarchies and sum of squares.  Videos of the talks by Prasad Raghavendra, David Steurer and myself are available from the link above. The content to unicorns ratio in Prasad and David’s talks is much higher ☺

Tomorrow, Wednesday May 25, is the international Towel Day in honor of Douglas Adams, author of the 5-book trilogy “The hitchhiker’s Guide to the Galaxy”. In his book (and his prior 1978 radio series) Adams gave a nice illustration of computational complexity and non uniform computation in his story about the “deep thought” computer who took 7.5 million years to answer the ultimate question of life, the universe, and everything, though today Google’s calculator can do this in 0.66 seconds.

If you want to celebrate towel day in Cambridge, bring your towel to Tselil Schramm’s talk on refuting random constraint satisfaction problems using Sums of Squares in MIT’s algorithms and complexity seminar (4pm, room 32-G575).