By Boaz Barak and Jarosław Błasiok
[Jarek gave a great informal exposition of this paper on Friday, and I asked him to make it into a blog post. My only knowledge of the paper is from Jarek’s explanations: my main contribution to this post was to delete some details and insert some inaccuracies –Boaz.]
A recent exciting paper of Ran Raz and Avishay Tal solved a longstanding problem in quantum complexity theory: the existence of an oracle under which the class is not contained in the polynomial hierarchy. The proof is elegant and actually not extremely complicated, and in this blog post we discuss the main idea behind it. For some of the motivation and background of this problem, see the posts of Scott Aaronson and Lance Fortnow. As is actually often the case in such relativization result, the proof does not have much to do with either
or the polynomial hierarchy. Rather at its heart is a new lower bound against the perennial “victim” of circuit lower bounds: the class
of constant depth circuits with AND, OR and NOT gates.
Rather than thinking about a single choice of an oracle, it turns out that it’s better to think of a distribution over oracles, and (for notational convenience) as a distribution over pairs of oracles
.
They show that:
- There is a quantum algorithm that can make one quantum query to each of these oracles, and using this distinguish with advantage at least
between the case that
and
are sampled from
, and the case that
and
are sampled from the uniform distribution. (Set
for concreteness.)
- For every constant depth polynomial size circuit
on
inputs (where
), the probability that
outputs
on
sampled from
differs by at most
from the probability it outputs
on
sampled from the uniform distribution.
Together 1 and 2 imply the main results by standard arguments.
The distribution 
The distribution they use is (a variant of) the “Forrelation” distribution proposed for this task by Aaronson. Namely, they choose
to be
correlated with the Fourier transform of
.
Specifically however, it is better to think about this distribution as follows:
- Sample
as correlated Gaussian random variables over
with the covariance matrix
where
is the
Hadamard matrix defined as
for every
. Another way to think about this is that
is a standard normal, and
. Note that for any
,
.
- Then let
be the distribution over
that has the expectation
. That is, for
, we choose
as a
random variable with expectation
and
as a
random variable with expectation
. If
or
are larger than
and each coordinate is a standard Normal then we truncate them appropriately, but because we set
this will only happen with probability
which we can treat as negligible and ignore this event from now on.
Distinguishing
from uniform is easy for quantum algorithms
The heart of the proof is in the classical part, and so we will merely sketch why this problem is easy for quantum algorithms. A Quantum Algorithm that gets query access to and
, which we think of as functions mapping
to
can obtain the state
The algorithm can then perform the Quantum Fourier Transform (in this case applying Hadamard to its qubits ) to change this to the state
But in our case, will be
correlated with
and so this can be shown to have the form
where is some state on the last
qubits.
If we now perform the Hadamard operation on the first qubit and measure it, we will be about more likely to see a zero than to see a 1. On the other hand it is not hard to show that if the input was uniform then we will see zero and one with equal probability. The advantage of
can be amplified by making more queries or considering “tensored” forms of the same oracles.
is pseudorandom for AC0 circuits
We now come to the heart of the proof: showing that it is hard to distinguish from the uniform distribution for
circuits. One of the challenges of obtaining such an oracle separation is that the standard approach of showing such a result, which is to show that
fools low degree polynomials, is not available to us. Indeed, our quantum algorithm itself is a degree 2 polyonomial in
and
!
At a high level, the proof shows that AC0 circuits can be approximated (to a certain extent) by very special kind of low degree polynomials – ones that are sparse in the sense that the L1 norm of their coefficients is not much larger than the L2 norm.
Anyway, we are getting ahead of ourselves. Let be computable by a polynomial size constant depth
circuit. Our goal is to show the following result:
Main Theorem:
We will identify such a function with its multilinear extention, and hence think of
as the polynomial
. Note that in this notation,
.
We start with the following simple observation: for every multilinear map , and
, if we sample from a product distribution
where the expectation of each coordinate matches the coordinate in
, then
. This means that for the purposes of analyzing the main theorem, we could just as well imagine that
is sampled from the normal multivariate distribution
over
with correlation matrix
.
Hence the result will follow from the following result:
Main Lemma Let be any multivariate Gaussian distribution over
such that
and
for
. Then
is
pseudorandom w.r.t. multilinear polynomials computable by polynomial sized circuits of constant depth.
The notation hides factors poly-logarithmic in
. In our case, the distribution
satisfies
, and
and so we get that
(and hence our original
) is
pseudorandom, completing the proof of the main theorem.
Note that the Main Lemma is of interest in its own right – it says that for a multivariate Gaussian to fool AC0, it is enough that the second moments sufficiently decay as long as the diagonal entries are moderately bounded away from 1. (However, for multivariate Gaussians, a bound on the second moments also implies decay of higher moments by Isserlis’ theorem.)
Proof of the Main Lemma
The Main Lemma is proved by a “Hybrid Argument”. (Related to arguments used by Chattopadhyay, Hatami, Hosseini and Lovett for constructing pseudorandom generators.) We will set . Since a sum of Gaussians is a Gaussian, for every multivariate Gaussian
, we can write
where is identically distributed to
.
We write . To prove the Main Lemma we need to show that
for every constant-depth poly-size computable multilinear
. To do so we will establish the following inequality:
for every . (Writing
for the
vector.)
We can obtain the main lemma by summing up over all
.
The main tool we will use to establish is the following structural result on AC0, which is again of independent interest:
L1 bound theorem: Let be a function computable by an
circuit of quasipolynomial size and constant depth, and let
and
. Then the function
defined as
satisfies that
for every
.
For those bounds on Fourier coefficients were proven by Avishay Tal. It turns out that general case can be essentially reduced to this case by a clever random restriction.
The L1 bound theorem implies , by writing
for
sampled from
. (The event
holds with high probability. It can be shown that conditioning on this event changes the relevant expectations in
by at most an additive term that is proportional to the failure probability, which is smaller than
.)
We can then bound the lefthand side of , by
the moment for vanishes. In the moments arising from
, the term
makes the contribution negligible, even when we sum it over
terms — by Isserlis’ theorem we mentioned above we can bound
. The only interesting term is when
where we get exactly
as desired. (As an aside, Isserlis’ Theorem gives a way to compute the moments of any multivariate Gaussian, and has a beautiful two line proof using Taylor expansion of
in the formal variables
.)
Proof of the L1 bound theorem
We now show the proof of the L1 bound theorem. We say that function is a restriction of
, if
where
.
Claim: For every fixed and multilinear
there exist a random
which is restriction of
, such that
Indeed, for given , take a variable
, such that on each coordinate independently we have
with probability
,
with probability
, and
otherwise. Note that
. Since
is multilinear, and
is independent across coordinates, we have
where depending on realization of
is a corresponding restriction of
— it fixes coordinates where
accordingly.
Now the L1 bound theorem follows by linearity of Fourier transform: for specific we have
. Every realization of
as a restriction of an
circuit is again an
circuit (both size and depth can only decrease), hence we can apply the bounds on Fourier coeffiecients of
circuits to
, namely
. Plugging those together we get
which is what we needed to prove.