Hastad‘s Switching Lemma is one of the gems of computational complexity. The switching lemma analyzes the effect of random restrictions on circuits, which are constant depth circuits with AND, OR and NOT gates of unbounded fanin. It says that such restrictions tends to simplify
circuits in surprising ways. An immediate consequence of the switching lemma is strong lower bounds for
circuits. It also has important applications in Fourier analysis, computational learning, pseudorandomness and more. Hastad’s lemma builds on earlier work due to Ajtai, Furst-Saxe-Sipser and Yao.
The crux of the matter is to analyze the effect of a random restriction on small-width DNF formulae. For this we need some notation. Let be a Boolean function on
input variables.
- A random restriction with parameter
is one where we leave a variable alive with probability
and set it randomly to
with probability
. We use
to denote the restricted function.
- We use
to denote the depth of the shallowest decision tree that computes
and
to denote the width of the smallest width DNF formula that computes it.
It is easy to see that , or rather that having a small depth decision tree implies having a small-width DNF representation. There is no converse to this statement: it is not hard to construct examples of functions where
is exponentially smaller than
, indeed the Tribes functions is such an example.
Hastad’s switching lemma tells us that although a small-width DNF might not have a small decision tree, a random restriction
of
almost certainly does. The precise statement is the following:
Hastad’s Switching Lemma: Let be a width
DNF. Let
denote the function obtained by applying a random restriction with probability
to it. Then
.
Lets try and digest this statement. Think of as a parameter between
and
. Let us pick the restrcition paramter
so that the RHS becomes
.
We are keeping variables alive with probability , which is not too high, but we can reasonably expect about
variables to remain alive. Nevertheless, the switching lemma tells us that the resulting function has decision tree depth
with probability
. Now a decision tree of depth
is a
-junta. So only a miniscule number of live variables continue to matter.
One could do this all day, nearly every setting of relevant parameters gives something fairly surprising. But perhaps the most surprising of all is the irrelevant parameter: the number of terms
(which is so irrelevant that we had not given it a symbol so far), or equivalently the number of variables. Surely
and/or
should matter in the statement somewhere? Any reasonable person trying to prove this statement would probably first try a union bound over all terms and run smack into
. As a colleague once said, the realization that
is irrelevant is mind-expanding.
By now there have been several alternate proofs of the switching lemma, I particularly like the exposition here due to Neil Thapen.
As scientists, when faced with magic, we try to find rationalizations for it. One possible rationalization for the absence of the number of variables/terms in the switching lemma could be that small-width DNFs are not too complex to start with. Sure they are not as simple as juntas, but they are not far being juntas either. A statement of this form in fact follows from Friedgut‘s junta theorem. Friedgut’s theorem tells us that any function with average sensitivity is close to a
-junta. We can apply Friedgut’s theorem in this context since it is not hard to show that a width-
DNF formula has average sensitivity bounded by
. A similar but weaker statement can be derived from the work of Ajtai-Wigderson and Trevisan, who show how to approximate small width DNFs by decision trees.
Some recent work with Raghu Meka and Omer Reingold takes a step towards converting this inutition into a proof of the switching lemma. We show that any width DNF is
-close to a width
DNF that has no more than
terms. Moreover, we prove a stronger approximation guarantee called sandwiching approximation which is often useful in pseudorandomness (see paper for details).
This statement is incomparable with the earlier consequence derived from Friedgut’s theorem, the dependence on is worse, but the dependence on
is better, and the approximators are themselves width
DNF formulas. Which strongly suggests a single best-of-both-worlds kind of statement, we pose this as a conjecture in our paper:
Conjecture: Any width DNF is
-close to a width
DNF that has no more than
terms.
If true, this says that any width DNF is well-approximated by a (
width) DNF with only
terms. If one can actually show sandwiching approximations, this would lead to a “union bound over terms” style proof of the switching lemma. The best lower bound we know comes from the Majority function on
variables, which shows that one needs at least
terms.
Parikshit