By Boaz Barak and Jarosław Błasiok

*[Jarek gave a great informal exposition of this paper on Friday, and I asked him to make it into a blog post. My only knowledge of the paper is from Jarek’s explanations: my main contribution to this post was to delete some details and insert some inaccuracies –Boaz.]*

A recent exciting paper of Ran Raz and Avishay Tal solved a longstanding problem in quantum complexity theory: the existence of an oracle under which the class is not contained in the polynomial hierarchy. The proof is elegant and actually not extremely complicated, and in this blog post we discuss the main idea behind it. For some of the motivation and background of this problem, see the posts of Scott Aaronson and Lance Fortnow. As is actually often the case in such relativization result, the proof does not have much to do with either or the polynomial hierarchy. Rather at its heart is a new lower bound against the perennial “victim” of circuit lower bounds: the class of constant depth circuits with AND, OR and NOT gates.

Rather than thinking about a single choice of an oracle, it turns out that it’s better to think of a *distribution* over oracles, and (for notational convenience) as a distribution over *pairs* of oracles .

They show that:

- There is a quantum algorithm that can make one quantum query to each of these oracles, and using this distinguish with advantage at least between the case that and are sampled from , and the case that and are sampled from the uniform distribution. (Set for concreteness.)
- For every constant depth polynomial size circuit on inputs (where ), the probability that outputs on sampled from differs by at most from the probability it outputs on sampled from the uniform distribution.

Together 1 and 2 imply the main results by standard arguments.

## The distribution

The distribution they use is (a variant of) the “Forrelation” distribution proposed for this task by Aaronson. Namely, they choose to be correlated with the Fourier transform of .

Specifically however, it is better to think about this distribution as follows:

- Sample as correlated Gaussian random variables over with the covariance matrix where is the Hadamard matrix defined as for every . Another way to think about this is that is a standard normal, and . Note that for any , .
- Then let be the distribution over that has the expectation . That is, for , we choose as a random variable with expectation and as a random variable with expectation . If or are larger than and each coordinate is a standard Normal then we truncate them appropriately, but because we set this will only happen with probability which we can treat as negligible and ignore this event from now on.

## Distinguishing from uniform is easy for quantum algorithms

The heart of the proof is in the classical part, and so we will merely sketch why this problem is easy for quantum algorithms. A Quantum Algorithm that gets query access to and , which we think of as functions mapping to can obtain the state

The algorithm can then perform the Quantum Fourier Transform (in this case applying Hadamard to its qubits ) to change this to the state

But in our case, will be correlated with and so this can be shown to have the form

where is some state on the last qubits.

If we now perform the Hadamard operation on the first qubit and measure it, we will be about more likely to see a zero than to see a 1. On the other hand it is not hard to show that if the input was uniform then we will see zero and one with equal probability. The advantage of can be amplified by making more queries or considering “tensored” forms of the same oracles.

## is pseudorandom for AC0 circuits

We now come to the heart of the proof: showing that it is *hard* to distinguish from the uniform distribution for circuits. One of the challenges of obtaining such an oracle separation is that the standard approach of showing such a result, which is to show that fools low degree polynomials, is not available to us. Indeed, our quantum algorithm itself is a degree 2 polyonomial in and !

At a high level, the proof shows that AC0 circuits can be approximated (to a certain extent) by very special kind of low degree polynomials – ones that are *sparse* in the sense that the L1 norm of their coefficients is not much larger than the L2 norm.

Anyway, we are getting ahead of ourselves. Let be computable by a polynomial size constant depth circuit. Our goal is to show the following result:

**Main Theorem:**

We will identify such a function with its *multilinear* extention, and hence think of as the polynomial . Note that in this notation, .

We start with the following simple observation: for every multilinear map , and , if we sample from a product distribution where the expectation of each coordinate matches the coordinate in , then . This means that for the purposes of analyzing the main theorem, we could just as well imagine that is sampled from the normal multivariate distribution over with correlation matrix .

Hence the result will follow from the following result:

**Main Lemma** Let be any multivariate Gaussian distribution over such that and for . Then is pseudorandom w.r.t. multilinear polynomials computable by polynomial sized circuits of constant depth.

The notation hides factors poly-logarithmic in . In our case, the distribution satisfies , and and so we get that (and hence our original ) is pseudorandom, completing the proof of the main theorem.

Note that the Main Lemma is of interest in its own right – it says that for a multivariate Gaussian to fool AC0, it is enough that the second moments sufficiently decay as long as the diagonal entries are moderately bounded away from 1. (However, for multivariate Gaussians, a bound on the second moments also implies decay of higher moments by Isserlis’ theorem.)

### Proof of the Main Lemma

The Main Lemma is proved by a “Hybrid Argument”. (Related to arguments used by Chattopadhyay, Hatami, Hosseini and Lovett for constructing pseudorandom generators.) We will set . Since a sum of Gaussians is a Gaussian, for every multivariate Gaussian , we can write

where is identically distributed to .

We write . To prove the Main Lemma we need to show that for every constant-depth poly-size computable multilinear . To do so we will establish the following inequality:

for every . (Writing for the vector.)

We can obtain the main lemma by summing up over all .

The main tool we will use to establish is the following structural result on AC0, which is again of independent interest:

**L1 bound theorem:** Let be a function computable by an circuit of quasipolynomial size and constant depth, and let and . Then the function defined as satisfies that

for every .

For those bounds on Fourier coefficients were proven by Avishay Tal. It turns out that general case can be essentially reduced to this case by a clever random restriction.

The L1 bound theorem implies , by writing for sampled from . (The event holds with high probability. It can be shown that conditioning on this event changes the relevant expectations in by at most an additive term that is proportional to the failure probability, which is smaller than .)

We can then bound the lefthand side of , by

the moment for vanishes. In the moments arising from , the term makes the contribution negligible, even when we sum it over terms — by Isserlis’ theorem we mentioned above we can bound . The only interesting term is when where we get exactly as desired. (As an aside, Isserlis’ Theorem gives a way to compute the moments of any multivariate Gaussian, and has a beautiful two line proof using Taylor expansion of in the formal variables .)

### Proof of the L1 bound theorem

We now show the proof of the L1 bound theorem. We say that function is a restriction of , if where .

**Claim:** For every fixed and multilinear there exist a random which is restriction of , such that

Indeed, for given , take a variable , such that on each coordinate independently we have with probability , with probability , and otherwise. Note that . Since is multilinear, and is independent across coordinates, we have

where depending on realization of is a corresponding restriction of — it fixes coordinates where accordingly.

Now the L1 bound theorem follows by linearity of Fourier transform: for specific we have . Every realization of as a restriction of an circuit is again an circuit (both size and depth can only decrease), hence we can apply the bounds on Fourier coeffiecients of circuits to , namely . Plugging those together we get

which is what we needed to prove.