Area Laws, Reed Muller Codes and Tolstoy

STOC 2016 just ended and it included many great results with one highlight of course being Laci Babai’s quasipolynomial time algorithm for graph isomorphism. But today I wanted to mention another paper that I found quite interesting and reminded me of the famous Tolstoy quote that

Happy families are all alike; every unhappy family is unhappy in its own way.

I am talking about the work Reed-Muller Codes Achieve Capacity on Erasure Channels by Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Sasoglu and Rudiger Urbanke. We are used to thinking of some error correcting codes as being “better” than others in the sense that they have fewer decoding errors. But it turns out that in some sense all codes of a given rate have the same average number of errors. The only difference is that “bad” codes (such as the repetition code), have a fairly “smooth” error profile in the sense that the probability of decoding success decays essentially like a low degree polynomial with the fraction of errors, while for “good” codes the decay is like a step function, where one can succeed with probability $1$ when the error is smaller than some $\tau$ but this probability quickly decays to half when the error passes $\tau$ .

Specifically, if $C\subseteq GF(2)^n$ is a linear code of dimension $k$ and $p \in [0,1]$ , we let $Y$ be the random variable over $GF(2)^n$ that is obtained by sampling a random codeword $X$ in $C$ and erasing (i.e., replacing it with $\star$ ) every coordinate $i$ independently with probability $p$ . Then we define $h_C(p)$ to be the average over all $i$ of the conditional entropy of $X_i$ given $Y_{-i}=(Y_1,\ldots,Y_{i-1},Y_{i+1},\ldots,Y_n)$ . Note that for linear codes, the coordinate $X_i$ is either completely fixed by $Y_{-i}$ or it is a completely uniform bit, and hence $h_C(p)$ can be thought of as the expected number of the coordinates that we won’t be able to decode with probability better than half from a $1-p$ -sized random subset of the remaining coordinates.

One formalization of this notion that all codes have the same average number of errors is known as the Area Law for EXIT functions which states that for every code $C$ of dimension $k$ , the integral $\int_0^1 h_C(p) dp$ is a fixed constant independent of $C$ . In particular note that if $C$ is the simple “repetition code” where we simply repeat every symbol $n/k$ times, then the probability we can’t decode some coordinate from the remaining ones (in which case the entropy is one) is exactly $p^{n/k-1}$ where $p$ is the erasure probability. Hence in this case we can easily compute the integral $\int_0^1 h_C(p)dp = \int_0^1 p^{n/k-1}dp = k/n$ which is simply one minus the rate of the code. In particular this tells us that the average entropy is always equal to the rate of the code. A code is said to be capacity achieving if there is some function $\epsilon=\epsilon(n)$ that goes to zero with $n$ such that $h_C(p)<\epsilon$ whenever $p< 1-k/n-\epsilon$ . The area law immediately implies that in this case it must be that $h_C(p)$ is close to one when $p>1-k/n$ (since otherwise the total integral would be smaller than $1-k/n$ ), and hence a code is capacity achieving if and only if the function $h_C(p)$ has a threshold behavior. (See figure below).

The paper above uses this observation to show that the Reed Muller code is capacity achieving for this binary erasure channel. The only property they use is the symmetry of this code which means that for this code we might as well have defined $h_C(p)$ with some fixed coordinate (e.g., the first one). In this case, using linearity, we can see that for every erasure pattern $\alpha$ on the coordinates $2,\ldots, n$ the entropy of $X_1$ given $Y_2,\ldots,Y_n$ is a Boolean monotone function of $\alpha$ . (Booleanity follows because in a linear subspace the entropy of the remaining coordinate is either zero or one; monotonicity follows because in the erasure channel erasing more coordinates cannot help you decode.) One can then use the papers of Friedgut or Friedgut-Kalai to establish such a property. (The Reed-Muller code has an additional stronger property of double transitivity which allows to deduce that one can decode not just most coordinates but all coordinates with high probability when the fraction of errors is a smaller than the capacity.)

area1

How do you prove this area law? The idea is simple. Because of linearity, we can think of the following setting: suppose we have the all zero codeword and we permute its coordinates randomly and reveal the first $t=(1-p)n$ of them. Then the probability that the $t+1^{th}$ coordinate is determined to be zero as well is $1-H_C(p)$ . Another way to say it is that if we permute the columns of the $n\times k$ generating matrix $A$ of $C$ randomly, then the probability that the $t+1^{th}$ column is independent from the first $t$ columns is $H_C(1-t/n)$ . In other words, if we keep track of the rank of the first $t$ columns, then at step $t$ the probability that the rank will increase by one is $H_C(1-t/n)$ , but since we know that the rank of all $n$ columns is $k$ , it follows that $\sum_{t=1}^n H_C(1-t/n) = n \int_0^1 H_C(p)dp = k$ , which is what we wanted to prove. QED

area2

p.s. Thanks to Yuval Wigderson, whose senior thesis is a good source for these questions.

	Anon on Thoughts by a non-economist on…
	Boaz Barak on AI Safety Course Intro Bl…
	aroraprinceton on AI Safety Course Intro Bl…
	Santiago Tomas Arang… on Six Thoughts On AI Safety
	Saturday assorted li… on Six Thoughts On AI Safety

Area Laws, Reed Muller Codes and Tolstoy

Published by Boaz Barak

2 thoughts on “Area Laws, Reed Muller Codes and Tolstoy”

Leave a reply to profarora Cancel reply

Share this:

Related

Published by Boaz Barak

2 thoughts on “Area Laws, Reed Muller Codes and Tolstoy”

Leave a reply to profarora Cancel reply