Guest post by Abhishek Anand and Noah Miller from the physics and computation seminar.

In 2013, Harlow and Hayden drew an unexpected connection between theoretical computer science and theoretical physics as they proposed a potential resolution to the famous black hole Firewall paradox using computational complexity arguments. This blog post attempts to lay out the Firewall paradox and other peculiar (at first) properties associated with black holes that make them such intriguing objects to study. This post is inspired by Scott Aaronson’s [1] and Daniel Harlow’s [2] excellent notes on the same topic. The notes accompanying this post provides a thorough and self-contained introduction to theoretical physics from a CS perspective. Furthermore, for a quick and intuitive summary of the Firewall paradox and it’s link to computational complexity, refer to this blog post by Professor Barak last summer.

## Black holes and conservative radicalism

Black holes are fascinating objects. Very briefly, they are regions of spacetime where the matter-energy density is so high and hence, where the gravitational effects are so strong that no particle (not even light!) can escape from it. More specifically, we define a particular distance called the “Schwarzschild radius” and anything that enters within the Schwarzschild radius, (also known as the “event horizon,”) cannot ever escape from the black hole. General relativity predicts that this particle is bound to hit the “singularity,” where spacetime curvature becomes infinite. In the truest sense of the word, they represent the “edge cases” of our Universe. Hence, perhaps, it is fitting that physicists believe that through thought experiments at these edges cases, they can investigate the true behavior of the laws that govern our Universe.

Once you know that such an object exists, many questions arise: what would it look it from the outside? Could we already be within the event horizon of a future black hole? How much information does it store? Would something special be happening at the Schwarzschild radius? How would the singularity manifest physically?

The journey of trying to answer these questions can aptly be described by the term “radical conservatism.” This is a phrase that has become quite popular in the physics community. A “radical conservative” would be someone that tries to modify as few laws of physics as possible (that’s the conservative part) and through their dogmatic refusal to modify these laws and go wherever their reasoning leads (that’s the radical part) is able to derive amazing things. We radically use the given system of beliefs to lead to certain conclusions (sometimes paradoxes!) and then conservatively update the system of beliefs to resolve the created paradox and iterate. We shall go through a few such cycles and end at the Firewall paradox. Let’s begin with the first problem: how much information does a black hole store?

## Entropy of a black hole

A black hole is a physical object. Hence, it could be able to store some information. But how much? In other words, what should the entropy of a black hole be? There are two simple ways of looking at this problem:

• 0: The no-hair theorem postulates that an outside observer can measure a small number of quantities which completely characterize the black hole. There’s the mass of the black hole, which is its most important quantity. Interestingly, if the star was spinning before it collapsed, the black hole will also have some angular momentum, and its equator will bulge out a bit. Hence, the black hole is also characterized by an angular momentum vector. Also, if the object had some net charge, the black hole would also have that net charge. This means that if two black holes were created due to a book and a pizza, respectively, with the same mass, charge and angular momentum, there would settle down to the “same” black hole with no observable difference. If an outside observer knows these quantities, they will now know everything about the black hole. So, in this view, we should expect for the entropy of a black hole to be 0.
• Unbounded: But maybe that’s not entirely fair. After all, the contents of the star should somehow be contained in the singularity, hidden behind the horizon. As we saw above, all of the specific details of the star from before the collapse do not have any effect on the properties of the resulting black hole. The only stuff that matters it the total mass, total angular momentum, and the total charge. That leaves an infinite number of possible objects that could all have produced the same black hole: a pizza or a book or a PlayStation and so on. So actually, perhaps, we should expect the entropy of a black hole to be unbounded.

The first answer troubled Jacob Bekenstein. He was a firm believer in the Second Law of Thermodynamics: the total entropy of an isolated system can never decrease over time. However, if the entropy of a black hole is 0, it provides with a way to reduce the entropy of any system: just dump objects with non-zero entropy into the black hole.

Bekenstein drew connections between the area of the black hole and its entropy. For example, the way in which a black hole’s area could only increase (according to classical general relativity) seemed reminiscent of entropy. Moreover, when two black holes merge, the area of the final black hole will always exceed the sum of the areas of the two original black holes This is surprising as for two spheres, the area/radius of the merged sphere, is always less than the sum of the areas/radii of two individual spheres:

$(r_1^3 + r_2^3)^{\frac{1}{3}} < r_1 + r_2$

Most things we’re used to, like a box of gas, have an entropy that scales linearly with its volume. However, black holes are not like most things. He predicted that entropy of a black hole should be proportional to its area, A and not its volume. We now believe that Bekenstein was right and it turns out that the entropy of the black hole can be written as:

$S=\frac{kA}{4l^2_p}$

where $k$ is Boltzmann constant and $l_p$ is the Planck-length, a length scale where physicists believe quantum mechanics breaks down and a quantum theory of gravity will be required. Interestingly, it seems as though the entropy of the black hole is (one-fourth times) the number of Planck-length-sized squares it would take to tile the horizon area. (Perhaps, the microstates of the black hole are “stored” on the horizon?) Using “natural units” where we set all constants to 1, we can write this as

$S=\frac{A}{4}$

which is very pretty. Even though this number of not infinite, it is very large. Here are some numerical estimates from [2]. The entropy of the universe (minus all the black holes) mostly comes from cosmic microwave background radiation and is about $10^{87}$ in some units. Meanwhile, in the same units, the entropy of a solar mass black hole is $10^{78}$. The entropy of our sun, as it is now, is a much smaller $10^{60}$. The entropy of the supermassive black hole in the center of our galaxy is $10^{88}$, larger than the rest of the universe combined (minus black holes). The entropy of any of the largest known supermassive black holes would be $10^{96}$. Hence, there is a simple “argument” which suggests that black holes are the most efficient information storage devices in the universe: if you wanted to store a lot of information in a region smaller than a black hole horizon, it would probably have to be so dense that it would just be a black hole anyway.

However, this resolution to “maintain” the second law of thermodynamics leads to a radical conclusion: if a black hole has non-zero entropy, it must have a non-zero temperature and hence, must emit thermal radiation. This troubled Hawking.

## Hawking radiation and black hole evaporation

Hawking did a semi-classical computation looking at energy fluctuations near the horizon and actually found that black holes do radiate! They emit energy in the form of very low-energy particles. This is a unique feature of what happens to black holes when you take quantum field theory into account and is very surprising. However, the Hawking radiation from any actually existing black hole is far too weak to have been detected experimentally.

One simplified way to understand the Hawking radiation is by thinking about highly coupled modes (think “particles”) being formed continuously near the horizon. As this formation must conserve the conservation of energy, one of these particles has negative energy and one of the particles has the same energy but with a positive sign and hence, they are maximally entangled (if you know the energy of one of the particles, you know the energy of the other one): we will be referring to this as short-range entanglement. The one with negative energy falls into the black hole while the one with positive energy comes out as Hawking radiation. The maximally-entangled state of the modes looks like:

$\sum_{\mathbf{k}} f(\mathbf{k}) |\mathbf{k}\rangle_{\rm in} |\mathbf{k}\rangle_{\rm out}$

Here is a cartoon that represents the process:

Because energetic particles are leaving the black hole and negative energy particles are adding to it, the black hole itself will actually shrink, which would never happen classically! And, eventually a black-hole will disappear. In fact, the time of evaporation of the black hole scales polynomially in the radius of the black hole, as $R^3$. The black holes that we know about are simply too big and would be shrinking too slowly. A stellar-mass black hole would take $10^{67}$ years to disappear from Hawking radiation.

However, the fact that black holes disappear does not play nicely with another core belief in physics: reversibility.

## Unitary evolution and thermal radiation

A core tenet of quantum mechanics is unitary evolution: every operation that happens to a quantum state must be reversible (invertible). That is: if we know the final state and the set and order of operations performed, we should be able to invert the operations and get back the initial state. No information is lost. However, something weird happens with an evaporating black hole. First, let us quickly review pure and mixed quantum states. A pure state is a quantum state that can be described by a single ket vector while a mixed state represents a classical (probabilistic) mixture of pure states and can be expressed using density matrices. For example, in both, the pure state $|\psi\langle = |1\rangle + |0\rangle$ and mixed state $\rho = |1\rangle\langle1| + |0\rangle\langle0|$ would one measure $1$ half the time and $0$ 50% half the time. However, in the later one would not observe any quantum effects (think interference patterns of the double-slit experiment).

People outside of the black hole will not be able to measure the objects (quantum degrees of freedom) that are inside the black hole. They will only be able to perform measurements on a subset of the information: the one available outside of the event horizon. So, the state they would measure would be a mixed state. A simple example to explain what this means is that if the state of the particles near the horizon is:

$|\Psi\rangle_{init} = \frac{|0\rangle_A|0\rangle_B +|1\rangle_A|1\rangle_B} {\sqrt{2}}$

tracing over the qubit A leaves us with the state and density matrix:

$|\Psi\rangle_{obs} = \frac{|0\rangle_{B}\langle0| + |1\rangle_{B}\langle1|}{2}$,

$\rho_{obs} = \frac{1}{2} \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$

which is a classical mixed state (50% of times results in 1 and 50% of times results in 0). The non-diagonal entries of the density matrix encode the “quantum inference” of the quantum state. Here, are they are, in some sense we have lost the “quantum” aspect of the information.

In fact, Hawking went and traced over the field degrees of freedom that were hidden behind the event horizon, and found something surprising: the mixed state was thermal! It acted “as if” it is being emitted by some object with temperature “T” which does not depend on what formed the black hole and solely depends on the mass of the black hole. Now, we have the information paradox:

• Physics perspective: Now, once the black hole evaporates, we are left with this mixed thermal there is no way to precisely reconstruct the initial state that formed the black hole: the black hole has taken away information! Once the black hole is gone, the information of what went into the black hole is gone for good. Nobody living in the post-black-hole universe could figure out exactly what went into the black hole, even if they had full knowledge of the radiation. Another way to derive a contradiction is that the process of black hole evaporation when combined with the disappearance of the black hole, imply that a pure state has evolved into a mixed state, something which is impossible via unitary time evolution! Pure states only become mixed states whenever we decide to perform a partial trace; they never become mixed because of Schrodinger’s equation which governs the evolution of quantum states.
• CS perspective: We live in a world where only invertible functions are allowed. However, we are given this exotic function – the black hole – which seems to be a genuine random one-to-many function. There is no way to determine the input deterministically given the output of the function.

What gives? If the process of black hole evaporation is truly “non-unitary,” it would be a first for physics. We have no way to make sense of quantum mechanics without the assumption of unitary operations and reversibility; hence, it does not seem very conservative to get ride of it.

Physicists don’t know exactly how information is conserved, but they think that if they assume that it does, it will help them figure out something about quantum gravity. Most physicists believe that the process of black hole evaporation should indeed be unitary. The information of what went into the black hole is being released via the radiation in way too subtle for us to currently understand. What does this mean?

• Physics perspective: Somehow, after the black hole is gone, the final state we observe, after tracing over the degrees of freedom taken away by the black hole, is pure and encodes all information about what went inside the black hole. That is: $Tr_{inside} (|\Psi\rangle_{init}\langle\Psi|) = pure$
• CS perspective: Somehow, the exotic black hole function seems random but actually is pseudo-random as well as injective and given the output and enough time, we can decode it and determine the input (think hash functions!).

However, this causes yet another unwanted consequence: the violation of the no-cloning theorem!

## Xeroxing problem and black hole complementarity

The no-cloning theorem simply states that an arbitrary quantum state cannot be copied. In other words, if you have one qubit representing some initial state, no matter what operations you do, you cannot end up with two qubits with the same state you started with. How do our assumptions violate this?

Say you are outside the black hole and send in a qubit with some information (input to the function). You collect the radiation corresponding to the qubit (output of the function) that came out. Now you decode this radiation (output) to determine the state of infalling matter (input). Aha! You have violated the no-cloning theorem as you have two copies of the same state: one inside and one outside the black hole.

So wait, again, what gives?

One possible resolution is to postulate that the inside of the black hole just does not exist. However, that doesn’t seem very conservative. According to Einstein’s theory of relativity, locally speaking, there is nothing particularly special about the horizon: hence, one should be able to cross the horizon and move towards the singularity peacefully.

The crucial observation is that for the person who jumped into the black hole, the outside universe may as well not exist; they can not escape. Extending this further, perhaps, somebody on the outside does not believe the interior of the black hole exists and somebody on the inside does not believe the exterior exists and they are both right. This hypothesis, formulated in the early 1990s, has been given the name of Black Hole Complementarity. The word “complementarity” comes from the fact that two observers give different yet complementary views of the world.

In this view, according to someone on the outside, instead of entering the black hole at some finite time, the infalling observer will instead be stopped at some region very close to the horizon, which is quite hot when you get up close. Then, the Hawking radiation coming off of the horizon will hit the observer on its way out, carrying the information about them which has been plastered on the horizon. So the outside observer, who is free to collect this radiation, should be able to reconstruct all the information about the person who went in. Of course, that person will have burned up near the horizon and will be dead.

And from the infalling observer’s perspective, however, they were able to pass peacefully through the black hole and sail on to the singularity. So from their perspective, they live, while from the outside it looks like they died. However, no contradiction can be reached, because nobody has access to both realities.

But why is that? Couldn’t the outside observer see the infalling observer die and then rocket themselves straight into the black hole themselves to meet the alive person once again before they hit the singularity, thus producing a contradiction?

The core idea is that it must take some time for the infalling observer to “thermalize” (equilibriate) on the horizon: enough time for the infalling observer to reach the singularity and hence become completely inaccessible. Calculations do show this to be true. In fact, we can already sense a taste of complexity theory even in this argument: we are assuming that some process is slower than some other process.

In summary, according to the BHC worldview, the information outside the horizon is redundant with the information inside the horizon.

But, in 2012, a new paradox, the Firewall paradox, was introduced by AMPS [3]. This paradox seems to be immune to BHC: the paradox exists even if we assume everything we have discussed till now. The physics principle we violate, in this case, is the monogamy of entanglement.

## Monogamy of entanglement and Page time

Before we state the Firewall paradox, we must introduce two key concepts.

### Monogamy of entanglement

Monogamy of entanglement is a statement about the maximum entanglement a particle can share with other particles. More precisely, if two particles A and B are maximally entangled with each other, they cannot be at all entanglement with a third particle C. Two maximally entangled particles have saturated both of their “entanglement quotas\”. In order for them to have correlations with other particles, they must decrease their entanglement with each other.

Monogamy of entanglement can be understood as a static version of the no-cloning theorem. Here is a short proof sketch of why polygamy of entanglement implies the violation of no-cloning theorem.

Let’s take a short detour to explain quantum teleportation:

Say you have three particles A, B, and C with A and B maximally entangled (Bell pair), and C is an arbitrary quantum state:

$|\Phi^+\rangle_{AB} = \frac{1}{\sqrt{2}} (|0\rangle_A \otimes |0\rangle_{B} + |1\rangle_A \otimes |1\rangle_{B})$

$|\psi\rangle_C = \alpha |0\rangle_C + \beta|1\rangle_C$

We can write their total state as:

$|\psi \rangle_{C}\otimes |\Phi ^{+}\rangle_{AB}=(\alpha |0\rangle_{C}+\beta |1\rangle_{C})\otimes {\frac {1}{\sqrt {2}}}(|0\rangle_{A}\otimes |0\rangle_{B}+|1\rangle_{A}\otimes |1\rangle_{B})$

Re-arranging and pairing A and C, the state simplifies to:

$|\psi \rangle {C}\otimes \ |\Phi ^{+}\rangle {AB}\ = \frac {1}{2}{\Big \lbrack }\ |\Phi ^{+}\rangle {AC}\otimes (\alpha |0\rangle {B}+\beta |1\rangle {B})\ +\ |\Phi ^{-}\rangle {AC}\otimes (\alpha |0\rangle {B}-\beta |1\rangle {B})\ +\ |\Psi ^{+}\rangle {AC}\otimes (\beta |0\rangle {B}+\alpha |1\rangle {B})\ +\ |\Psi ^{-}\rangle {AC}\otimes (\beta |0\rangle {B}-\alpha |1\rangle {B}){\Big \rbrack }$

which means that if one does a Bell pair measurement on A and C, based on the measurement outcome, we know exactly which state B is projected to and by using rotations can make the state of B equal to the initial state of C. Hence, we teleported quantum information from C to B.

Now, assume that A was maximally entangled to both B and D. Then by doing the same procedure, we could teleport quantum information from C to both B and D and hence, violate the no-cloning theorem!

### Page time

Named after Don Page, the “Page time” refers to the time when the black hole has emitted enough of its energy in the form of Hawking radiation that its entropy has (approximately) halved. Now the question is, what’s so special about the Page time?

First note that the rank of the density matrix is closely related to its purity (or mixedness). For example, a completely mixed state is the diagonal matrix:

$\rho_{obs} = \frac{1}{4} \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}$

which has maximal rank ($2^{2}$). Furthermore, a completely pure state $|\Psi\rangle \langle\Psi|$ can always be represented as (if we just change the basis and make the first column/row represent $|\Psi\rangle$):

$\rho_{obs} = \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix}$

which has rank 1.

Imagine we have watched a black hole form and begin emitting Hawking radiation. Say we start collecting this radiation. The density matrix of the radiation will have the form:

$\rho_{obs} = \sum_{i=1}^{2^{n-k}} p_i |\Psi\rangle_{i}\langle\Psi|$

where $n$ is the total number of qubits in our initial state, $k$ is the number of qubits outside (in form of radiation), and $p_i$ is the probability of each state. We are simply tracing over the degrees of freedom inside the black hole (as there are $n-k$ degrees inside the black hole, dimensionality of this space is $2^{n-k}$).

Don Page proposed the following graph of what he thought entanglement entropy of this density matrix should look like. It is fittingly called the “Page curve.”

• If $k<\frac{n}{2}$, rank($\rho$) = $2^{k}$, as there are enough terms in the sum to get a maximally ranked matrix. And hence, we get maximally mixed states. In the beginning, the radiation we collect at early times will still remain heavily entangled with the degrees of freedom near the black hole, and as such the state will look mixed to us because we can not yet observe all the complicated entanglement. As more and more information leaves the black hole in the form of Hawking radiation, we are “tracing out” fewer and fewer of the near-horizon degrees of freedom. The dimension of our density matrix grows bigger and bigger, and because the outgoing radiation is still so entangled with the near-horizon degrees of freedom, the density matrix will still have off-diagonal terms which are essentially zero. Hence, the state entropy increases linearly.
• But if $k>\frac{n}{2}$, by the same argument, rank($\rho$) = $2^{n-k} < 2^{k}$. Hence, the density matrix becomes more and more pure. Once the black hole’s entropy has reduced by half, the dimension of the Hilbert space we are tracing out finally becomes smaller than the dimension of the Hilbert space we are not tracing out. The off-diagonal terms spring into our density matrix, growing in size and number as the black hole continues to shrink. Finally, once the black hole is gone, we can easily see that all the resulting radiation is in a pure state.

The entanglement entropy of the outgoing radiation finally starts decreasing, as we are finally able to start seeing entanglements between all this seemingly random radiation we have painstakingly collected. Some people like to say that if one could calculate the Page curve from first principles, the information paradox would be solved. Now we are ready to state the firewall paradox.

Say Alice collects all the Hawking radiation coming out of a black hole. At maybe, about $1.5$ times the Page time, Alice is now able to see significant entanglement in all the radiation she has collected. Alice then dives into the black hole and sees an outgoing Hawking mode escaping. Given the Page curve, we know that knowing this outgoing mode must decrease the entropy of our observed mixed state. In other words, it must make our observed density matrix purer. And hence, be entangled with the particles we have already collected.

(Another way to think about this: let’s say that a random quantum circuit at the horizon scrambles the information in a non-trivial yet injective way in order for radiation particles to encode the information regarding what went inside the black hole. The output qubits of the circuit must be highly entangled due to the random circuit.)

However, given our discussion on Hawking radiation about short-range entanglement, the outgoing mode must be maximally entangled with an in-falling partner mode. This contradicts monogamy of entanglement! The outgoing mode cannot be entangled both with the radiation Alice has already collected and also maximally entangled with the nearby infalling mode!

So, to summarize, what did we do? We started with the existence of black holes and through our game of conservative radicalism, modified how physics works around them in order to make sure the following dear Physics principles are not violated by these special objects:

• Second Law of Thermodynamics
• Objects with entropy emit thermal radiation
• Unitary evolution and reversibility
• No-cloning theorem
• Monogamy of entanglement

And finally, ended with the Firewall paradox.

So, for the last time in this blog post, what gives?

• Firewall solution: The first solution to the paradox is the existence of a firewall at the horizon. The only way to not have the short-range entanglement discussed is if there is very high energy density at the horizon. However, this violates the “no-drama” theorem and Einstein’s equivalence principle of general relativity which states that locally there should be nothing special about the horizon. If firewalls did exist, an actual wall of fire could randomly appear out of nowhere in front of us right now if a future black hole would have its horizon near us. Hence, this solution is not very popular.
• Remnant solution: One possible resolution would be that the black hole never “poofs” but some quantum gravity effect we do not yet understand stabilizes it instead, allowing for some Planck-sized object to stick around? Such an object would be called a “remnant.” The so-called “remnant solution” to the information paradox is not a very popular one. People don’t like the idea of a very tiny, low-mass object holding an absurdly large amount of information.
• No unitary evolution: Perhaps, black holes are special objects which actually lose information! This would mean that black hole physics (the quantum theory of gravity) would be considerably different compared to quantum field theory.
• Computational complexity solution?: Can anyone ever observe this violation? And if not, does that resolve the paradox? This will be covered in our next blog post by Parth and Chi-Ning.

## References

1. Scott Aaronson. The complexity of quantum states and transformations: from quantum money to black holes.arXiv preprintarXiv:1607.05256, 2016.
2. Daniel Harlow. Jerusalem lectures on black holes and quantum information. Reviews of Modern Physics, 88(1):015002, 2016.
3. Ahmed Almheiri, Donald Marolf, Joseph Polchinski, and JamesSully. Black holes: complementarity or firewalls? Journal of HighEnergy Physics, 2013(2):62, 2013.

(This post from the lecture by Yueqi Sheng)

In this post, we will talk about detecting phase transitions using
Approximate-Message-Passing (AMP), which is an extension of
Belief-Propagation to “dense” models. We will also discuss the Replica
Symmetric trick, which is a heuristic method of analyzing phase
transitions. We focus on the Rademacher spiked Wigner model (defined
below), and show how both these methods yield the same phrase transition
in this setting.

The Rademacher spiked Wigner model (RSW) is the following. We are given
observations $Y = \frac{\lambda}{n}xx^T + \frac{1}{\sqrt{n}}W$ where
$x \in \{\pm 1\}^n$ (sampled uniformly) is the true signal and $W$ is a
Gaussian-Orthogonal-Ensemble (GOE) matrix:
$W_{i, j} \sim \mathbb{N}(0, 1)$ for $i \neq j$ and
$W_{i, i} \sim \mathbb{N}(0, 2)$. Here $\lambda$ is the signal to noise
ratio. The goal is to approximately recover $x$.

The question here is: how small can $\lambda$ be such that it is
impossible to recover anything reasonably correlated with the
ground-truth $x$? And what do the approximate-message-passing algorithm

To answer the first question, one can think of the task here is to
distinguish $Y \sim \frac{\lambda}{n}xx^T + \frac{1}{\sqrt{n}}W$ vs
$Y \sim W$. One approach to distinguishing these distributions is to
look at the spectrum of the observation matrix $Y$. (In fact, it turns
out that this is an asymptotically optimal distinguisher [1]). The spectrum of $Y$ behaves as ([2]):

• When $\lambda \leq 1$, the empirical distribution of eigenvalues in
spiked model still follows the semicircle law, with the top
eigenvalues $\approx 2$

• When $\lambda > 1$, we start to see an eigenvalue $> 2$ in the
planted model.

# Approximate message passing

This section approximately follows the exposition in [3].

First, note that in the Rademacher spiked Wigner model, the posterior
distribution of the signal $\sigma$ conditioned on the observation $Y$
is: $\Pr[\sigma | Y] \propto \Pr[Y | \sigma] \propto \prod_{i \neq j} \exp(\lambda Y_{i, j} \sigma_i \sigma_j /2 )$ This
defines a graphical-model (or “factor-graph”), over which we can perform
Belief-Propogation to infer the posterior distribution of $\sigma$.
However, in this case the factor-graph is dense (the distribution is a
product of potentials $\exp(\lambda Y_{i, j} \sigma_i\sigma_j)$ for all
pairs of $i, j$).

In the previous blog post, we saw belief propagation works great when the underlying interaction
graph is sparse. Intuitively, this is because $G$ is locally tree like,
which allows us to assume each messages are independent random
variables. In dense model, this no longer holds. One can think of dense
model as each node receive a weak signal from all its neighbors.

In the dense model setting, a class of algorithms called Approximate
message passing (AMP) is proposed as an alternative of BP. We will
define AMP for RWM in terms of its state evolution.

## State evolution of AMP for Rademacher spiked Wigner model

Recall that in BP, we wish to infer the posterior distributon of
$\sigma$, and the messages we pass between nodes correspond to marginal
probability distribution over values on nodes. In our setting, since the
distributions are over $\{\pm 1\}$, we can represent distributions by
their expected values. Let $m^t_{u \to v} \in [-1, 1]$ denote the
message from $u$ to $v$ at time $t$. That is, $m_{u \to v}$ corresponds
to the expected value ${{\mathbb{E}}}[\sigma_u]$.

To derive the BP update rules, we want to compute the expectation
${{\mathbb{E}}}[\sigma_v]$ of a node $v$, given the
messages ${{\mathbb{E}}}[\sigma_u]$ for $u \neq v$. We can
do this using the posterior distribution of the RWM, $\Pr[\sigma | Y]$,
which we computed above.
$\displaystyle \Pr[\sigma_v = 1 | Y, \{\sigma_u\}_{u \neq v}] = \frac{ \prod_u \exp(\lambda Y_{u, v} \sigma_u) - \prod_u \exp(-\lambda Y_{u, v} \sigma_u) }{ \prod_u \exp(\lambda Y_{u, v} \sigma_u) + \prod_u \exp(-\lambda Y_{u, v} \sigma_u) }$

And similarly for $\Pr[\sigma_v = -1 | Y, \{\sigma_u\}_{u \neq v}]$.
From the above, we can take expectations over $\sigma_u$, and express
${{\mathbb{E}}}[\sigma_v]$ in terms of
$\{{{\mathbb{E}}}[\sigma_u]\}_{u \neq v}$. Doing this (and
using the heuristic assumption that the distribution of $\sigma$ is a
product distribution), we find that the BP state update can be written
as:
$m^{t}_{u \to v} = f(\sum_{w \neq v}f^{-1}(A_{w, u} m^{t - 1}_{w \to u}))$
where the interaction matrix $A_{w, u} = \lambda Y_{w, u}$, and
$f(x) = tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(x)}$.

Now, Taylor expanding $f^{-1}$ around $0$, we find
$m^{t}_{u \to v} = f\left( (\sum_{w \neq v} A_{w, u} m^{t - 1}_{w \to u}) + O(1/\sqrt{n}) \right)$
since the terms $A_{w, u}$ are of order $O(1/\sqrt{n})$.

At this point, we could try dropping the “non-backtracking” condition
$w \neq v$ from the above sum (since the node $v$ contributes at most
$O(1/\sqrt{n})$ to the sum anyway), to get the state update:
$m^{t}_{u} = f\left( \sum_{w} A_{w, u} m^{t - 1}_{w}) \right)$ (note the messages no longer
depend on receiver – so we write $m_u$ in place of $m_{u \to v}$).
However, this simplification turns out not to work for estimating the
signal. The problem is that the “backtracking” terms which we added
amplify over two iterations.

In AMP, we simply perform the above procedure, except we add a
correction term to account for the backtracking issue above. Given $u$,
for all $v$, the AMP update is:
$m^{t}_{u \to v} = m^{t}_u = f(\sum_{w}A_{w, u} m^{t - 1}_{w}) + [\text{some correction term}]$

The correction term corresponds to error introduced by the backtracking
terms. Suppose everything is good until step $t - 2$. We will examine
the influence of backtracking term to a node $v$ through length 2 loops.
At time $t - 1$, $v$ exert $Y_{v, u}m^{t - 2}_v$ additional influence to
each of it’s neighbor $u$. At time $t$, $v$ receive roughly
$Y_{u, v}^2m^{t - 2}_v$. Since $Y_{u, v}^2$ has magnitude
$\approx \frac{1}{n}$ and we need to sum over all of $v$’s neighbors,
this error term is to large to ignore. To characterize the exact form of
correction, we simply do a taylor expansion

$m^{t}_v = \sum_{u}f(Y_{u, v}m^{t - 1}_u) = \sum_{u}f(Y_{u, v} \left(\sum_{w}f(Y_{w, u}m^{t - 2}_w) - f(Y_{u, v}m^{t - 2}_w)\right) )\\ \approx \sum_u f(Y_{u, v} m^{t - 1}_u) - Y_{u, v}f'(m^{t - 1}_u)m^{t - 2}_v\\ \approx \sum_u f(Y_{u, v} m^{t - 1}_u) - \frac{1}{n}\sum_{u}f'(m^{t - 1}_u)m^{t - 2}_v$

## State evolution of AMP

In this section we attempt to obtain the phase transition of Rademacher
spiked Wigner model via looking at $m^{\infty}$.

We assume that each message could be written as a sum of signal term and
noise term. $m^t = \mu_t x + \sigma_t g$ where
$g \sim \mathbb{N}(0, I)$. To the dynamics of AMP (and find its phase
transition), we need to look at how the signal $\mu_t$ and noise
$\sigma_t$ evolves with $t$.

We do the following simplification: ignore the correction term and
assume each time we obtain an independent noise $g$.

$m^{t} = Yf(m^{t - 1}) = (\frac{\lambda}{n}x^Tx + \frac{1}{\sqrt{n}}W)f(m^{t - 1}) = \frac{\lambda}{n} < f(m^{t - 1}), x > x + \frac{1}{\sqrt{n}} Wf(m^{t - 1})$

Here, we see that $\mu_t = \frac{\lambda}{n}< f(m^{t - 1}), x>$
and $\sigma_t = \frac{1}{\sqrt{n}}Wf(m^{t - 1})$.

Note that $\mu_{t}$ is essentially proportional to overlap between
ground truth and current belief
, since the function $f$ keeps the
magnitude of the current beliefs bounded.

$\frac{\lambda}{n} = \frac{\lambda}{n} \approx\lambda {{\mathbb{E}}}_{X \sim unif(\pm 1), G\sim \mathbb{N}(0, 1)}[X f(\mu_{t - 1}X + \sigma_{t - 1}G)] = \lambda {{\mathbb{E}}}_G[f(\mu_{t - 1} + \sigma_{t - 1}G)]$

For the noise term, each coordinate of $\sigma_t$ is a gaussian random
variable with $0$ mean and variance

$\frac{1}{n} \sum_v f(m^{t - 1})_v^2 \approx {{\mathbb{E}}}_{X, G}[f(\mu_{t - 1}X + \sigma_{t - 1}G)^2] = {{\mathbb{E}}}_{G}[f(\mu_{t - 1} + \sigma_{t - 1}G)^2]$

It was shown in [4] that we can introduce a new
parameter $\gamma_t$ s.t.
$\gamma_t = \lambda^2 {{\mathbb{E}}}[f(\gamma_{t - 1} + \sqrt{\gamma_{t - 1}}G)]$
As $t \to \infty$, turns out $\mu_t = \frac{\gamma_t}{\lambda}$ and
$\sigma_t^2 = \frac{\sigma_t}{\lambda^2}$. To study the behavior of
$m^t$ as $t \to \infty$, it is enough to track the evolution of
$\gamma_t$.

This heuristic analysis of AMP actually gives a phase transition at
$\lambda = 1$ (in fact, the analysis of AMP can be done rigorously as in [5]):

• For $\lambda < 1$: If $\gamma_t \approx 0$, $|\gamma_t + \sqrt{\gamma_t}G| < 1$ w.h.p., thus we have $\gamma_{t + 1} \approx \lambda^2 (\gamma_t) < \gamma_t$. Taking $t \to \infty$, we have $\gamma_{\infty} = 0$, which means there AMP solution has no overlap with the ground truth.

• For $\lambda > 1$: In this case, AMP’s solution has some correlation with the ground truth.

(Figure from [6])

# Replica symmetry trick

Another way of obtaining the phase transition is via a non-rigorous
analytic method called the replica method. Although non-rigorous, this
method from statistical physics has been used to predict the fixed point
of many message passing algorithms and has the advantage of being easy
to simulate. In our case, we will see that we obtain the same phase
transition temperature as AMP above. The method is non-rigorous due to
several assumptions made during the computation.

## Outline of replica method

Recall that we are interested in minizing the free energy of a given
system $f(\beta, Y) = \frac{1}{\beta n} \log Z(\beta, Y)$ where $Z$ is
the partition function as before:
$Z(\beta, Y) = \sum_{x \in \{\pm 1\}^n} exp(-\beta H(Y, x))$ and
$H(Y, x) = - = -xYx^T = -\sum_{i, j} Y_{i, j}x_ix_j$.

In replica method, $Y$ is not fixed but a random variable. The
assumption is that as $n \to \infty$, free energy doesn’t vary with $Y$
too much, so we will look at the mean of $f_Y$ to approximate free
energy of the system.

$f(\beta) = \lim_{n \to \infty}\frac{1}{\beta n}{{\mathbb{E}}}_{Y}[\log Z(\beta, Y)]$

$f(\beta)$ is called the free energy density and the goal now is to
compute the free energy density as a function of only $\beta$ , the
temperature of the system.

The replica method is first proposed as a simplification of the
computation of $f(\beta)$

It is a generally hard problem to compute $f(\beta)$ in a clear way. A
naive attempt of approximate $f(\beta)$ is to simply pull the log out
$g(\beta) = \frac{1}{\beta n}\log {{\mathbb{E}}}_Y[Z(\beta, Y)]$
Unfortunately $g(\beta)$ and $f(\beta)$ are quite different quantities,
at least when temperature is low. Intuitively, $f(\beta)$ is looking at
system with a fixed $Y$ while in $g(\beta)$, $x$ and $Y$ are allowed to
fluctuate together. When the temperature is high, $Y$ doesn’t play a big
roll in system thus they could be close. However, when temperature is
low, there could be a problems. Let $\beta \to \infty$,
$f(\beta) \approx \int_Y (\beta x_Y Y x_Y)\mu(Y) dY$,
$g(\beta) \approx \log \int_Y exp(\beta x_J Y x_Y)\mu(Y)dY \approx \beta x^* Yx^*$.

While ${{\mathbb{E}}}_X[\log(f(X))]$ is hard to compute,
${{\mathbb{E}}}[f(X)^r]$ is a much easier quantity. The
replica trick starts from rewriting $f(\beta)$ with moments of $Z$:
Recall that $x^r \approx 1 + r \log x$ for $r \approx 0$ and
$\ln(1 + x)\approx x$, using this we can rewrite $f(x)$ in the following
way:

Claim 1. Let $f_r(\beta) = \frac{1}{r \beta n}\ln[{{\mathbb{E}}}_Y[Z(\beta, Y)^r]]$
Then, $f(\beta) = \lim_{r \to 0}f_r(\beta)$

The idea of replica method is quite simple

• Define a function $f(r, \beta)$ for $r \in \mathbb{Z}_+$ s.t. $f(r, \beta) = f_r(\beta)$ for all such $r$.

• Extend $f(r, \beta)$ analytically to all $r \in {{\mathbb{R}}}_+$ and take the limit of $r \to 0$.

The second step may sound crazy, but for some unexplained reason, it has
been surprisingly effective at making correct predictions.

The term replica comes from the way used to compute
${{\mathbb{E}}}[Z^r]$ in Claim 1. We expand the $r$-th moment
in terms of $r$ replicas of the system

$Z(\beta, Y)^r = (\sum_x exp(-\beta H(Y, x)))^r = \sum_{x^1, \cdots, x^r} \Pi_{k = 1}^r exp(-\beta H(Y, x^i))$

## For Rademacher spiked Wigner model

In this section, we will see how one can apply the replica trick to
obtain phase transition in the Rademacher spiked Wigner model. Recall
that given a hidden $a \in \{\pm 1\}^n$, the observable
$Y = \frac{\lambda}{n}a^Ta + \frac{1}{\sqrt n} W$ where
$W_{i, j} \sim \mathcal{N}(0, 1)$ and $W_{i, i} \sim \mathcal{N}(0, 2)$.
We are interested in finding the smallest $\lambda$ where we can still
recover a solution with some correlation to the ground truth $a$. Note
that $\{W_{i, i}\}$ is not so important here as $x_i^2$ doesn’t carry
any information in this case.

Given by the posterior ${{\mathbb{P}}}[x|Y]$, the system we
set up corresponding to Rademacher spiked Wigner model is the following:

• the system consists of $n$ particles and the interactions between
each particle are give by $Y$

• the signal to noise ratio $\lambda$ as the inverse temperature
$\beta$.

Following the steps above, we begin by computing
$f(r, \beta) = \frac{1}{r\beta n}\ln{{\mathbb{E}}}_Y[Z^r]$
for $r \in \mathbb{Z}_+$: Denote $X^k = (x^k)^Tx^k$ where $x^k$ is the
$k$th replica of the system.

${{\mathbb{E}}}_Y[Z^r] = \int_Y \sum_{x^1, \cdots, x^r} exp(\beta \sum_k \mu(Y) dY\\ = \int_Y \sum_{x^1, \cdots, x^r} exp(\beta ) \mu(Y) dY$

We then simplify the above expression with a technical claim.

Claim 2. Let $Y = A + \frac{1}{\sqrt{n}}W$ where $A$ is a fixed matrix and
$W$ is the GOE matrix defined as above. Then,
$\int_Y exp(\beta) \mu(Y) dY = exp(\frac{\beta^2}{n}{{\|{X}\|_{F}}}^2 + \frac{\beta}{2} )$
for some constant $C$ depending on distribution of $Y$.

Denote $X = \sum_k X^k$. Apply Claim 2 with
$A = \frac{\beta}{n}a^Ta$, we have
${{\mathbb{E}}}_Y[Z^r] = \sum_{x^1, \cdots, x^r} exp(\frac{\beta^2}{n}{{\|{X}\|_{F}}}^2 + \frac{\beta^2}{2n} )$
To understand the term inside exponent better, we can rewrite the inner
sum in terms of overlap between replicas:

${{\|{X}\|_{F}}}^2 = \sum_{i, j}X_{i, j}^2 = \sum_{i, j}(\sum_{k = 1}^r x^k_ix^k_j)^2 =\sum_{i, j}(\sum_{k = 1}^r x^k_ix^k_j)(\sum_{l = 1}^r x^l_ix^l_j)\\ = \sum_{k, l} (\sum_{i = 1}^n x^k_ix^{l}_i)^2 = \sum_{k, l} ^2$

where the last equality follows from rearranging and switch the inner
and outer summations.

Using a similar trick, we can view the other term as

$ = \sum_{i, j}\sum_{k = 1}^rx^k_ix^k_ja_ia_j = \sum_{k = 1}^r (\sum_{i = 1}^n a_ix^k_i)^2 = \sum_{k}^2$

Note that $Q_{k, l} = $ represents overlaps between the
$k$ and $l$th replicas and $Q_k = $ represents the
overlaps between the $k$th replica and the ground truth vector.

In the end, we get for any integer $r$, (Equation 1):

$\displaystyle f(r, \beta) = \frac{1}{r\beta n}\ln(\sum_{x^1, \cdots, x^r} exp(\frac{\beta^2}{n}\sum_{k, l}Q_{k, l}^2 + \frac{\beta^2}{2n}\sum_k Q_k^2)) \label{e:1}\\ = \frac{1}{r\beta n} \ln(\sum_{Q}\nu_{x^k}(Q)exp(\frac{\beta^2}{n}\sum_{k, l}Q_{k, l}^2 + \frac{\beta^2}{2n}\sum_k Q_k^2))$

Our goal becomes to approximate this quantity. Intuitively, if we think
of $Q_{k, l}$ as indices on a $(r + 1) \times (r + 1)$ matrices, $Q$,
with $Q(i,i) = 1$, then $Q$ is the average of $n$ i.i.d matrices. So we
expect $Q_{j, k} \in [\pm \frac{1}{n}]$ for $j \neq k$ w.h.p. In the
remaining part, We find the correct $Q$ via rewriting Equation 1.

Observe that by introducing a new variable $Z_{k, l}$ for $k \neq l$ and
using the property of gaussian intergal (Equation 4):

$\label{e:4} exp(\frac{\beta^2}{n}Q_{k, l}^2) = \sqrt{\frac{n}{4\pi}}\int_{Z_{k, l}} exp(-\frac{n}{4}Z_{k, l}^2 + \beta Q_{k, l}Z_{k, l})dZ_k$

$\exp(\frac{\beta^2}{2n}Q_k^2) = \sqrt{\frac{1}{8\pi n}}\int_{Z_k}exp(-(2n)Z_k^2 + 2\beta Q_{k}Z_k)dZ_k$
Replace each $exp(\frac{\beta^2}{n}Q_{k, l}^2)$ by a such integral, we
have (Equation 2):

$\begin{gathered} {{\mathbb{E}}}[Z^r] = \sum_{x^1, \cdots, x^r} exp(\frac{\beta^2}{n}\sum_{k, l}Q_{k, l}^2 + \frac{\beta^2}{2n}\sum_k Q_k^2) \label{e:2}\\ = C\sum_{x^1, \cdots, x^r} \exp(\beta^2 n)\int_{Z_{k, l}}exp(-\frac{n}{4}\sum_{k \neq l}Z_{k, l}^2 - \frac{n}{2}\sum_k Z_k^2 + \beta \sum_{k \neq l}Y_{k, l}Q_{k, l} + 2\beta\sum_kZ_k Q_k) dZ \\ =C\exp(\beta^n) \int_{Y_{k, l}}exp(-\frac{n}{4}\sum_{k \neq l}Y_{k, l}^2 - \frac{n}{2}\sum_k Z_k^2 + \ln(\sum_{x_1,\cdots, x_r}exp(\beta \sum_{k\neq l}Y_{k, l}Q_{k, l} + 2\beta\sum_kY_k Q_k)) dY \label{e:2}\end{gathered}$

where $C$ is the constant given by introducing gaussian intergals.

To compute the integral in (Equation 2), we need to cheat a little bit and take
$n \to \infty$ before letting $r \to 0$. Note that free energy density
is defined as
$f(\beta) = \lim_{n \to \infty}\lim_{r \to 0}\frac{1}{r\beta n}\ln {{\mathbb{E}}}_Y[Z(\beta, Y)^r]$
This is the second assumption made in the replica method and it is
commonly believed that switching the order is okay here. Physically,
this is plausible because we believe intrinsic physical quantities
should not depend on the system size.

Now the Laplace method tells us when $n \to \infty$, the integral in (Equation 2) is dominated by the max of the exponent.

Theorem 1 (Laplace Method). Let $h(x): {{\mathbb{R}}}^n \to {{\mathbb{R}}}$then

$\int e^{nh(x)} \approx e^{nh(x^*)}(\frac{2\pi}{n})^{\frac{d}{2}}\frac{1}{\sqrt{det(H)}}$

where $x^* = argmax_x \{h(x)\}$ and $H$ is the Hessian of $h$ evaluated at the point $x^*$.

Fix a pair of $k, l$ and apply Laplace method with
$h(Z_{k, l}) = -\frac{1}{2}\sum_{0 \leq k < l \leq r}Z_{k, l}^2 + \frac{1}{n}\ln(\sum_{x_1,\cdots, x_r}exp(\beta \sum_{k \neq l}Z_{k, l}Q_{k, l} + 2\beta\sum_kZ_k Q_k))$
what’s left to do is to find the critical point of $h$. Taking the
derivatives gives
$-Y_{k, l} + \frac{A(Z_{k, l})\beta Q_{k, l}}{n A(Z_{k, l})} = 0$
where
$A(Z_{k, l}) = \sum_{x_1,\cdots, x_r}exp(\beta \sum_{k \neq l}Z_{k, l}Q_{k, l} + \beta\sum_kY_k Q_k)$.

We now need to find a saddle point of $h$ where the hessian is PSD. To
do that, we choose to assume the order of the replicas does not matter,
which is refer to as the replica symmetry case. 1 One simplest form
of $Y$ is the following: $\forall k, l > 0$, $Z_{k, l} = y$ and
$Z_{k} = y$ for some $y$. This also implies that $Q_{k, l} = q$ for some
$q$ and $y =\frac{\beta}{n} q$

Plug this back in to Equation 2 gives: (Equation 3)

$\label{e:3} {{\mathbb{E}}}[Z^r] = C\exp(\beta n)\exp(-\frac{n}{2}(\frac{r^2 - r}{2})y^2 - \frac{n^2}{2} + \ln(\sum_{x^i}\exp(y\beta\sum_{k \neq l}Q_{k, l} + 2y\beta \sum_k Q_k))$

To obtain $f(r, \beta)$, we only need to deal with the last term in
(Equation 3) as $r \to 0$. Using the fact that $Q_{k, l} = y$ for all
$k, l$ and using the same trick of introducing new gaussain integral as
in (Equation 4) we have
$\lim_{r \to 0}\frac{1}{r}\ln(\sum_{x^i}\exp(y\beta\sum_{k \neq l}Q_{k, l} + n\beta \sum_k Q_k)) = -\beta + {{\mathbb{E}}}_{z \sim \mathcal{N}(0, 1)}[\log(2cosh(y\beta + \sqrt{y\beta}z))]$

Using the fact that we want the solution to minimizes free energy,
taking the derivative of the current $f$ w.r.t. $y$ gives
$\frac{y}{\beta} = n{{\mathbb{E}}}_z[tanh(y\beta + \sqrt{y\beta}z)]$
which matches the fixed point of AMP. Plug in $q$ and $y$ will give us
$f(\beta)$. The curve of $f(\beta)$ looks like the Figure below, where
the solid line is the curve of $f(\beta)$ with the given $q$ and the
dotted line is the curve given by setting all variables $0$.

References

[1] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Optimality and sub-optimality of pca for spiked random matrices and synchronization.
arXiv preprint arXiv:1609.05573, 2016.
[2] D. Feral and S. Pech e. The Largest Eigenvalue of Rank One Deformation of Large Wigner Matrices. Communications in Mathematical Physics, 272:185–228, May 2007.
[3] Afonso S Bandeira, Amelia Perry, and Alexander S Wein. Notes on computational-to-statistical gaps: predictions using statistical physics. arXiv preprint arXiv:1803.11132, 2018.
[4] Yash Deshpande, Emmanuel Abbe, and Andrea Montanari. Asymptotic mutual information for thebinary stochastic block model. In
Information Theory (ISIT), 2016 IEEE International Symposium on, pages 185–189. IEEE, 2016.
[5] Adel Javanmard and Andrea Montanari. State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference: A Journal of the IMA, 2(2):115–144, 2013.
[6] A. Perry, A. S. Wein, and A. S. Bandeira. Statistical limits of spiked tensor models.
ArXiv e-prints, December 2016.

1. Turns out for this problem, replica symmetry is the only case. We
will not talk about replica symmetry breaking here, which
intuitively means we partition replicas into groups and re-curse.

There’s a lot of discussion and (possibly well-deserved) hype nowadays about quantum computation and its potential for computation at speeds we simply can’t reach with the classical computers we’re used to today. The excitement about this has been building for years, even decades, but it’s only very recently that we’ve really been approaching a solid proof that quantum computers do have an advantage over classical computers.

What’s rather tricky about showing such a result is that, rather than a direct argument about the capability of quantum computers, what we really need to demonstrate is the incapability of classical computers to achieve tasks that can be done with quantum computers.

One of the major leaps forward in demonstrating quantum supremacy was taken by Terhal and DiVincenzo in their 2008 paper “Adaptive quantum computation, constant depth quantum circuits and arthur-merlin games“. Their approach was to appeal to a complexity-theoretic argument: they gave evidence that there exists a certain class of quantum circuits that cannot be simulated classically by proving that if a classical simulation existed, certain complexity classes strongly believed to be distinct would collapse to the same class. While this doesn’t quite provide a proof of quantum supremacy – since the statement about the distinction between complexity classes upon which it hinges is not a proven fact – because the complexity statement appears overwhelmingly likely to be true, so too does the proposed existence of non-classically-simulatable quantum circuits. The Terhal and DiVincenzo paper is a complex and highly technical one, but in this post I hope to explain a little bit and give some intuition for the major points.

Now, let’s start at the beginning. What is a quantum circuit? I’m going to go ahead and assume you already know what a classical circuit is – the extension to a quantum circuit is rather straightforward: it’s a circuit in which all gates are quantum gates, where a quantum gate can be thought of as a classical gate whose output is, rather than a deterministic function of the inputs, instead a probability distribution over all possible outputs given the size of the inputs. For example, given two single-bit inputs, a classical AND gate outputs 0 or 1 deterministically given the inputs. A quantum AND gate on the analogous single-qubit inputs would output 0 with some probability $p$ and 1 with some probability $1-p$. Similarly, a classical AND gate on two 4-bit inputs outputs the bitwise AND, while the quantum analog has associated with it a 4-qubit output: some probability distribution over all 4-bit binary strings. A priori there is no particular string that is the “output” of the computation by the quantum gate; it’s only after taking a quantum measurement of the output that we get an actual string that we can think of as the outcome of the computation done by the gate. The actual string we “observe” upon taking the measurement follows the probability distribution computed by the gate on its inputs. In this way, a quantum circuit can then be thought of as producing, via a sequence of probabilistic classical gates (i.e., quantum gates) some probability distribution over possible outputs given the input lengths. It’s not hard to see that in this way, we can compose circuits: suppose we have a quantum circuit $c_1$ and another quantum circuit $c_2$. Let $c_1$ have an input of $n$ qubits and an output of $m$ qubits; suppose we measure $k$ of the output qubits of $c_1$ – then we can feed the remaining $m-k$ unmeasured qubits as inputs into $c_2$(assuming that those $m-k$ qubits do indeed constitute a valid input to $c_2$).

Consider, then, the following sort of quantum circuit: it’s a composition of $N$ quantum circuits, such that after each $i$-th circuit we take a measurement some of its output qubits (so that the remaining unmeasured qubits become inputs to the $(i+1)$-th circuit), and then the structure of the $(i+1)$-th circuit is dependent on this measurement. That is, it’s as though, given a quantum circuit, we’re checking every so often at intermediate layers over the course of the circuit’s computation what the value of some of the variables are (leaving the rest to keep going along through the circuit to undergo more computational processing), and based on what we measure is the current computed value, the remainder of the circuit “adapts” in a way determined by that measurement. Aptly enough, this is called an “adaptive circuit”. But since the “downstream” structure of the circuit depends on the outcomes of all the measurements made “upstream”, each adaptive circuit actually comprises a family of circuits, each of which is specified by the sequence of intermediate measurement outcomes. That is, we can alternatively characterize an adaptive circuit as a set of ordinary quantum circuits that is parameterized by a list of measurement outcomes. Terhal and DiVincenzo call this way of viewing an adaptive circuit, as a family of circuits parametrized by a sequence of measurement values, a “non-adaptive circuit” – since we replace the idea that the circuit “adapts” to intermediate measurements with the idea that there are just many regular circuits, one for each possible sequence of measurements. It’s this non-adaptive circuit concept that’ll be our main object of study going forward.

## Simulating quantum circuits

Now, the result we wanted to demonstrate about quantum circuits had to do with their efficient simulatability by classical circuits – and so we should establish some notion of what we mean when we talk about an “efficient simulation”.

Terhal and DiVincenzo offer the following notion of a classical simulation – which in their paper they call an “efficient density computation”: consider a quantum circuit with some output of length $n$ qubits. Recall that to actually obtain an output value, we need to take a measurement of the circuit output – imagine doing this in disjoint subsets of cubits at a time. That is, we can break up the $n$ qubits into $k$ disjoint subsets and consider the entire output measurement as a process of taking $k$ measurements, subset by subset. An efficient density computation exists if there’s a classical procedure for computing, in time polynomial in the width and depth of the quantum circuit, the conditional probability distribution over the set of possible measurement outcomes of a particular subset of qubits, given any subset of the $k-1$ other measurement outcomes. Intuitively, this is a good notion of what a classical simulation should consist of, or at least what data it should contain, since if you know the conditional probabilities given any (possibly empty) subset of the other measurements, you can just flip coins for the outputs according to the conditional probabilities as a way of actually exhibiting a “working” simulation.

It’s with this notion of simulation, along with our concept of an adaptive quantum circuit as a family of regular circuits parameterized by a sequence of intermediate measurement outcomes, we may now arrive at the main result of Terhal and DiVincenzo’s paper. Recall that what we wanted to show from the very beginning is that there exists some quantum circuit that can’t be simulated classically. The argument for this proceeds like a proof by contradiction: suppose the contrary, and that all quantum circuits can be simulated classically. We want to show that we can find, then, a quantum circuit which, if it were possible to be simulated classically (as per our assumption), we’d wind up with some strange consequences that we believe are false, leading us to conclude that those circuits probably can’t be simulated classically.

Thus, we shall now exhibit such a quantum circuit whose classical simulatability leads (as far as we believe) to a contradiction. Consider a special case of adaptive quantum circuits, considered as a parameterized family of regular circuits, in which the circuit’s output distribution is independent of the intermediate measurement outcomes; that is, the case in which the entire family of circuits corresponding to an adaptive circuit is logically the same – that is, is the same logical circuit on input qubits independent of intermediate measurements. I’d like to point out, just for clarification’s sake, the subtlety here, which makes this consideration non-redundant, and not simply a reduction of an adaptive quantum circuit (again, thought of as a family) to a single fixed circuit (i.e., a family of one): the situation in which the family is reduced to a single fixed circuit occurs when the structure of the circuit is independent of the intermediate measurement outcomes. If the structure were independent of the measurements, then no matter what we observed in the measurements, we’d get the same circuit – hence a trivial family of one. What we’re considering instead is the case in which the structure of the circuit is still dependent on the intermediate measurements (and so the circuit is still adaptive), but where the distribution over the possible outputs of the circuit is identical no matter what the intermediate measurements are. In this case, the circuit can still be considered as a parameterized and in general non-trivial family of circuits, but for which each member produces the same distribution over outputs – hence, a family of potentially structurally different circuits, but which are logically identical.

Suppose there’s some set of such circuits that’s universal – that is, that’s sufficient to implement all polynomial-time quantum computations. (This is a reasonable assumption to make, since there do in fact exist universal quantum gate sets. But now if a simulation of the kind we defined (an efficient density computation) existed for every circuit in this set, then we could calculate the outcome probability of any polynomial-depth quantum circuit, since any polynomial-depth quantum circuit could be realized as some composition of circuits in this universal set (and in particular as a composition of particular family members of each adaptive circuit in the universal set), and an efficient density computation, as we mentioned above, precisely gives us a way to compute the output distribution.

But now here is where our believed contradiction lies:

Theorem: Suppose there exists a universal set of adaptive quantum circuits whose output distributions are independent of intermediate measurements. If there is an efficient density computation for each family member of each adaptive circuit in this universal set, then for the polynomial hierarchy PH we have PH = BPP = BQP.

The proof goes something like this: if we can do our desired efficient density computations (as we assumed, for the sake of contradiction, we could for all quantum circuits), this is equivalent to being able to determine the acceptance probability of a quantum computation, which was shown in the paper “Determining Acceptance Possibility for a Quantum Computation is Hard for the Polynomial Hierarchy” by Fenner, Green, Homer and Pruim to be equivalent to the class $\text{coC=P}$. Thus, we have that $\text{coC=P} \subseteq \text{BPP}$. But it’s known that $\text{PH} \subseteq \text{BPP}^{\text{coC=P}}$ and so $\text{PH} \subseteq \text{BPP}^{\text{BPP}} = \text{BPP}$. That is, we have $\text{PH} = \text{BPP} = \text{BQP}$, and so the polynomial hierarchy would collapse to $\Sigma^P_2$ since $\text{BPP} \subseteq \Sigma^P_2$ (for more on these more obscure complexity classes, see here). Again, this is our “contradiction”: while it hasn’t been quite proven, it is widely believed, with strong supporting evidence, that the polynomial hierarchy does not collapse as would be the case if all quantum circuits were classically simulatable. Thus this provides a strong argument that not all quantum circuits are classically simulatable, which was precisely what we were looking to demonstrate.

## Conclusion

Terhal and DiVincenzo actually go even further and show that there is a certain class of constant-depth quantum circuits that are unlikely to be simulatable by classical circuits – this, indeed, seems to provide even stronger evidence for quantum supremacy. This argument, which is somewhat more complex, uses the idea of teleportation and focuses on a particular class of circuits implementable by a certain restricted set of quantum gates. If you’re interested, I highly recommend reading their paper, where this is explained.

• Terhal, Barbara and David DiVincenzo. “Adptive quantum computation, constant depth quantum circuits and arthur-merlin games.” Quantum Info. Comput. 4, 2 (2004); 134-145.
• Bravyil, Sergey, David Gosset, and Robert König. “Quantum advantage with shallow circuits.” Science 362, 6412 (2018); 308-311.
• Harrow, Aram Wettroth and Ashley Montanaro. “Quantum computational supremacy.” Nature 549 (2017): 203-209.
• Boixo, Sergio et. al. “Characterizing quantum supremacy in near-term devices.” Nature Physics 14 (2018); 595-600.

## References

\begin{enumerate}

• Terhal, Barbara and David DiVincenzo. “Adptive quantum computation, constant depth quantum circuits and arthur-merlin games.” Quantum Info. Comput. 4, 2 (2004); 134-145.
• Fenner, Stephen et. al. “Determining Acceptance Possibility for a Quantum Computation is Hard for the Polynomial Hierarchy.” https://arxiv.org/abs/quant-ph/9812056. (1998).
• Bravyil, Sergey, David Gosset, and Robert König. “Quantum advantage with shallow circuits.” Science 362, 6412 (2018); 308-311.

This is the season that people are applying for postdoc positions. Unlike student and faculty hiring, which each have a fairly fixed schedule, postdoc availability can change from time to time, with new opportunities opening up all the time. So, I encourage everyone looking for a postdoc position to periodically check out https://cstheory-jobs.org/ .

(For example, a new ad was just posted on Wednesday by Venkat Guruswami and Pravesh Kothari )

tags:

[Guest post by Noah Miller – a Harvard Physics Ph.D student that took our seminar. Noah’s webpage contains wonderful and extensive notes that can be of interest to computer scientists. –Boaz]

(The following blog post serves as an introduction to the following notes:)

Black Holes, Hawking Radiation, and the Firewall

There are many different types of “theoretical physicists.” There are theoretical astrophysicists, theoretical condensed matter physicists, and even theoretical biophysicists. However, the general public seems to be most interested in the exploits of what you might call “theoretical high energy theorists.” (Think Stephen Hawking.)

The holy grail for theoretical high energy physicists (who represent only a small fraction of all physicists) would be to find a theory of quantum gravity. As it stands now, physicists have two theories of nature: quantum field theory (or, more specifically, the “Standard Model”) and Einstein’s theory of general relativity. Quantum field theory describes elementary particles, like electrons, photons, quarks, gluons, etc. General relativity describes the force of gravity, which is really just a consequence of the curvature of spacetimes.

Sometimes people like to say that quantum field theory describes “small stuff” like particles, while general relativity describes “big stuff” like planets and stars. This is maybe not the best way to think about it, though, because planets and stars are ultimately just made out of a bunch of quantum particles.

Theoretical physicists are unhappy with having two theories of nature. In order to describe phenomena that depend on both quantum field theory and general relativity, the two theories must be combined in an “ad hoc” way. A so-called “theory of everything,” another name for the currently unknown theory of “quantum gravity,” would hypothetically be able to describe all the phenomena we know about. Just so we’re all on the same page, they don’t even have a fuly worked out hypothesis. (“String theory,” a popular candidate, is still not even a complete “theory” in the normal sense of the word, although it could become one eventually.)

So what should these high energy theoretical physicists do if they want to discover what this theory of quantum gravity is? For the time being, nobody can think up an experiment that would be sensitive to quantum gravitation effects which is feasible with current technology. We are limited to so-called “thought experiments.”

This brings us to Hawking radiation. In the 1970’s, Stephen Hawking considered what would happen to a black hole once quantum field theory was properly taken into account. (Of course, this involved a bit of “ad hoc” reasoning, as mentioned previously.) Hawking found that, much to everybody’s surprise, the black hole evaporated, realeasing energy in the form of “Hawking radiation” (mostly low energy photons). More strangely, this radiation comes out exactly in the spectrum you would expect from something “hot.” For example, imagine heating a piece of metal. At low temperatures, it emits low energy photons invisible to the human eye. Once it gets hotter, it glows red, then yellow, then perhaps eventually blue. The spectrum of light emitted follows a very specific pattern. Amazingly, Hawking found that the radiation which black holes emit follow the exact same pattern. By analogy, they have a temperature too!

This is more profound than you might realize. This is because things which have an temperature should also have an “entropy.” You see, there are two notions of “states” in physics: “microstates” and “macrostates.” A microstate gives you the complete physical information of what comprises a physical system. For example, imagine you have a box of gas, which contains many particles moving in a seemingly random manner. A “microstate” of this box would be a list of all the positions and momenta of every last particle in that box. This would be impossible to measure in pratice. A “macrostate,” on the other hand, is a set of microstates. You may not know what the exact microstate your box of gas is in, but you can measure macroscopic quantities (like the total internal energy, volume, particle number) and consider the set of all possible microstates with those measured quantities.

The “entropy” of a macrostate is the logarithm of the number of possible microstates. If black holes truly are thermodynamic systems with some entropy, that means there should be some “hidden variables” or microstates to the black hole that we currently don’t understand. Perhaps if we understood the microstates of the black hole, we would be much closer to understanding quantum gravity!

However, Hawking also discovered something else. Because the black hole is radiating out energy, its mass will actually decrease as time goes on. Eventually, it should disappear entirely. This means that the information of what went into the black hole will be lost forever.

Physicists did not like this, however, because it seemed to them that the information of what went into the black hole should never be lost. Many physicists believe that the information of what went into the black hole should somehow be contained in the outgoing Hawking radiation, although they do not currently understand how. According to Hawking’s original calculation, the Hawking radiation only depends on a parameters of the black hole (like its mass) and have nothing to do with the many finer details on what went in, the exact “microstate” of what went in.

However, physicists eventually realized a problem with the idea that the black hole releases its information in the form of outgoing Hawking radiation. The problem has to do with quantum mechanics. In quantum mechanics, it is impossible to clone a qubit. That means that if you threw a qubit in a black hole and then waited for it to eventually come out in the form of Hawking radiation, then the qubit could no longer be “inside” the black hole. However, if Einstein is to be believed, you should also be able to jump into the black hole and see the qubit on the inside. This seems to imply that the qubit is cloned, as it is present on both the inside and outside of the black hole.

Physicists eventually came up with a strange fix called “Black Hole Complementarity” (BHC). According to BHC, according to people outside the black hole, the interior does not exist. Also, according to people who have entered the black hole, the outside ceases to exist. Both descriptions of the world are “correct” because once someone has entered the black hole, they will be unable to escape and compare notes with the person on the outside.

Of course, it must be emphasized that BHC remains highly hypothetical. People have been trying to poke holes in it for a long time. The largest hole is the so called “Firewall Paradox,” first proposed in 2012. Essentially, the Firewall paradox tries to show that the paradigm of BHC is self-contradictory. In fact, it was able to use basic quantum mechanics to show that, under some reasonable assumptions, the interior of the black hole truly doesn’t exist, and that anyone who tries to enter would be fried at an extremely hot “firewall!” Now, I don’t think most physicists actually believe that black holes really have a firewall (although this might depend on what day of the week you ask them). The interesting thing about the Firewall paradox is that it derives a seemingly crazy result from seemingly harmless starting suppositions. So these suppositions would have to be tweaked in the theory of quantum gravity order to get rid of the firewall.

This is all to say that all this thinking about black holes really might help physicists figure out something about quantum gravity. (Then again, who can really say for sure.)

If you would like to know more about the Firewall paradox, I suggest you read my notes, pasted at the top of this post!

The goal of the notes was to write an introduction to the Black Hole Information Paradox and Firewall Paradox that could be read by computer scientists with no physics background.

The structure of the notes goes as follows:

1. Special relativity
2. General relativity
3. Quantum Field Theory (in which I ambitiously tell you what QFT actually is!)
4. Statistical Mechanics (this is the best part)

Because the information paradox touches on all areas of physics, I thought it was necessary to take “zero background, infinite intelligence” approach, introducing the all the necessary branches of physics (GR, QFT, Stat Mech) in order to understand what the deal with Hawking radiation really is, and why physicists think it is so important. I think it is safe to say that if you read these notes, you’ll learn a non-trivial amount of physics.

Nilin Abrahamsen nilin@mit.edu

Daniel Alabi alabid@g.harvard.edu

Mitali Bafna mitalibafna@g.harvard.edu

Emil Khabiboulline ekhabiboulline@g.harvard.edu

Juspreet Sandhu jus065@g.harvard.edu

Two-prover one-round (2P-1R) games have been the subject of intensive study in classical complexity theory and quantum information theory. In a 2P-1R game, a verifier sends questions privately to each of two collaborating provers , who then aim to respond with a compatible pair of answers without communicating with each other. Sharing quantum entanglement allows the provers to improve their strategy without any communication, illustrating an apparent paradox of the quantum postulates. These notes aim to give an introduction to the role of entanglement in nonlocal games, as they are called in the quantum literature. We see how nonlocal games have rich connections within computer science and quantum physics, giving rise to theorems ranging from hardness of approximation to the resource theory of entanglement.

## Introduction

In these notes we discuss 2-prover 1-round games and the classical complexity of approximating the value of such games in the setting where the provers can share entanglement. That is, given the description of a game, we ask how hard it is to estimate the winning probability of the best winning strategy of the entangled provers. Let us first formally define games and its relation to the label cover problem. We write ${ [n]=\{1,\ldots,n\}}$.

#### Definition (Label cover)

A label cover instance ${ I=(S,T,\Sigma,\Pi)}$ consists of variable sets ${ S=\{s_i\}_{i\in[n]},T=\{t_j\}_{j\in[n]}}$, alphabet set ${ \Sigma}$, and a collection ${ \Pi}$ of constraints of the form ${ t_j=f_{i,j}(s_i)}$. Given an assignment (or coloring) ${ c : S\cup T \rightarrow \Sigma}$ we define its value to be ${ \omega(c)=\mathbb P_{f_{ij}\sim\Pi}\big(c(t_j)=f_{ij}(c(s_i))\big)}$. Define the value of ${ I}$ to be the maximum over all possible assignments, i.e. ${ \omega(I) = \max_{c} \omega(c)}$. Many familiar computational problems can be formulated as a label cover, such as ${ \textsc{3SAT}}$, ${ \textsc{3Lin}}$, and ${ \textsc{MaxCut}}$.

#### Definition (2-prover 1-round game)

Let ${ I = (S,T,\Sigma,\Pi)}$ be a label cover instance. We can then associate the following two-prover one-round game ${ G(I)}$ with ${ I}$. Let ${ P_1,P_2}$ be two provers who cannot communicate, and let ${ V}$ be the verifier. Given the label cover instance ${ I}$, the verifier uniformly samples a constraint ${ f_{i,j} \in \Pi}$ and sends ${ s_i}$ to ${ P_1}$ and ${ t_j}$ to ${ P_2}$. The provers then reply with ${ a \in \Sigma}$ and ${ b\in\Sigma}$ respectively to ${ V}$. Finally, ${ V}$ outputs ${ 1}$ if and only if ${ b = f_{i,j}(a)}$. Any coloring of the label cover instance corresponds to a deterministic strategy for the corresponding game. Therefore, with an optimal strategy the provers win the game associated to label cover instance ${ I}$ with probability ${ \omega(I)}$. That is, the value of the game equals that of the label cover instance. However, this is with the assumption that provers can only use deterministic strategies or convex combinations of these (that is, using shared randomness). If the provers share an entangled quantum state, then the provers (who still cannot communicate) can enjoy correlations that allow them to win with a higher probability than classically. In the quantum literature, these 2P-1R games are known as nonlocal games referring to the fact that the correlations arise without signaling between the provers. We are concerned with the complexity of approximating the winning probability of this strategy. We refer to the optimal winning probability within some class (classical or entangled) of strategies as the classical and quantum value of the game, respectively, and we use the terms quantum strategy and entangled strategy interchangeably. Fixing different constraint families in the label cover game changes the complexity of finding the (classical and entangled) values of the game. We will show that approximating the entangled value of XOR games, or more generally unique games (to be defined later on), is possible in polynomial time. This is remarkable because a famous conjecture known as the unique games conjecture says that approximating the classical value of unique games is NP-hard. In contrast, we will see that for unrestricted edge constraints, it is NP-hard to approximate the entangled value of a nonlocal game. Thus, hardness of approximation of the game’s value, established by the celebrated PCP theorem , still applies in the presence of entanglement. In the quantum world, we have new complexity classes such as QMA (which can be regarded as “quantum NP”), so one may conjecture whether approximating the entangled value of a general game is QMA-hard (the games formulation of the quantum PCP conjecture ). We will indicate progress in this direction but will explicitly demonstrate the NP-hardness result. Entanglement is often regarded as an expensive resource in quantum information because it is difficult to produce and maintain. Hence, even if sharing entanglement can improve the success probability of winning a game, the resource consumption may be costly. We will conclude by discussing lower bounds on the number of shared entangled bits required to achieve the optimal value of a game.

## Notation and quantum postulates

Let us first establish notation and define what is meant by an entangled strategy. In keeping with physics notation we write a column vector as ${ |{v}\rangle \in\mathbb C^d}$ and its conjugate-transpose (a row vector) as ${ \langle{v}| }$ . More generally the conjugate-transpose (Hermitian conjugate) of a matrix ${ A}$ is written ${ A^\dag}$. Then ${ \langle{v}| w\rangle}$ is the inner product of two vectors (a scalar) and ${ |{v}\rangle \langle{w}| }$ the outer product (a rank-1 matrix). A matrix ${ A}$ is said to be Hermitian if ${ A^\dag=A}$. A matrix ${ A}$ is positive semidefinite , written ${ A\succeq 0}$, if ${ A=B^\dag B}$ for some matrix ${ B}$. We write the identity matrix as ${ {\mathbb{I}}}$, denote by ${ Herm(\mathbb{C}^d)}$ the set of ${ d}$-by-${ d}$ Hermitian matrices.

### Observables, states, and entanglement

In a quantum theory the observables are Hermitian operators ${ A\in Herm(\mathbb C^d)}$ on a vector space ${ \mathbb C^d}$. It then makes sense to say that a state is a functional ${ \varphi}$ on the set of observables. That is, to specify the state of a physical system means giving a (expected) value for each observable. It turns out states are linear functionals of the observables, and such functionals can be written ${ \varphi(A)=\langle A,\rho\rangle}$ for some ${ d}$-by-${ d}$ matrix ${ \rho}$. We call ${ \rho}$ the density matrix and require moreover that ${ \rho}$ is positive semidefinite and has trace ${ 1}$. Every density matrix is a convex combination of rank-one projections ${ |\psi\rangle\langle\psi|}$ known as pure states. The unit vectors ${ |\psi\rangle\in\mathbb C^d}$ are also themselves known as pure states. If the state of one particle is described by a vector in ${ \mathbb C^d}$ (referring here to pure states), then two particles are described by a vector in the tensor product ${ \mathbb C^d\otimes\mathbb C^d}$. The two particles are entangled if their state is not in the form of a pure tensor ${ |\phi\rangle\otimes|\psi\rangle}$. We also write product states as ${ |\phi\rangle|\psi\rangle}$, omitting the tensor symbol.

### Quantum measurements

A quantum measurement can be described in terms of a projection-valued measure (PVM). Definition 1 (PVM) A projection-valued measure on vector space ${ \mathbb C^d}$ (where the quantum states live) is a list of projection matrices ${ A_1,\ldots,A_k}$ on ${ \mathbb C^d}$ such that ${ A_iA_j=0}$ for ${ i\neq j}$ and ${ \sum_i A_i={\mathbb{I}}}$. The PVM describes a measurement which, on state ${ |\psi\rangle\in\mathbb C^d}$ outputs measurement outcome ${ i}$ with probability ${ \langle\psi| A_i|\psi\rangle}$. The quantum state after obtaining outcome ${ i}$ is ${ \frac1{\|A_i|\psi\rangle\|}A_i|\psi\rangle}$. When the projections are rank-one projections ${ A_i= |{\beta_i}\rangle \langle{\beta_i}| }$ we say that we measure in the basis ${ \{ |{\beta_i}\rangle \}_i}$. In this case the probability of outcome ${ i}$ in state ${ |\psi\rangle}$ is ${ |\langle\beta_i |{\psi}\rangle |^2}$, and the post-measurement state is simply ${ |{\beta_i}\rangle}$ . Applying the measurement ${ (A_i)_i}$ on the left half of a two-particle state ${ |{\Psi}\rangle \in\mathbb C^d\otimes\mathbb C^d}$ means applying the PVM ${ (A_i\otimes {\mathbb{I}})_{i}}$ on the two-particle state.

### Quantum strategies for nonlocal games

We now introduce the notion of a quantum strategy for a nonlocal game. Each prover holds a particle, say with state space ${ \mathbb C^d}$, and Alices particle may be entangled with Bob’s. The global state is ${ |{\phi}\rangle _{AB}\in\mathbb C^d\otimes\mathbb C^d}$. Each player receives a question from the verifier and then chooses a measurement (a PVM) depending on the question. The player applies the measurement to their own particle and responds to the verifier with their measurement outcome. Hence for Alice we specify ${ n}$ PVM’s ${ A^s}$ where ${ s\in S}$ is a question, and each PVM is a list ${ A^s_{a=1},\ldots,A^s_k}$. By definition 1, given questions ${ s,t}$ the probability that Alice outputs ${ a}$ and Bob outputs ${ b}$ is

$\displaystyle P(a,b|s,t)=\|({\mathbb{I}}\otimes B^t_b)(A^s_a\otimes {\mathbb{I}}) |{\phi}\rangle _{AB}\|^2= \langle{\phi}| A^s_a\otimes B^t_b|\phi\rangle, \ \ \ \ \ (1)$

where we have used that squaring a projection leaves it unchanged.

## Quantum strategies beat classical ones

For any 2P-1R game ${ G}$, let ${ \omega(G)}$ be the maximum probability — over the players’ classical strategies — that the verifier accepts and ${ \omega^*(G)}$ the maximum probability that the verifier accepts when the provers use qubits such that player 1’s qubits are entangled with those of player 2. The game of Clauser, Horne, Shimony, and Holt (CHSH) has the property that the provers can increase their chance of winning by sharing an entangled pair of qubits, even when no messages are exchanged between the players. We show that there’s a characterization of the CHSH game’s value ${ \omega^*(G)=\cos^2(\frac{\pi}{8}) = \frac{1}{2}+\frac{\sqrt{2}}{4}\approx 0.85}$ which is better than the classical value ${ \omega(G)\leq \frac{3}{4}}$. Let us first define XOR games, of which the CHSH game is a special case.

#### Definition (XOR game)

An XOR game is a 2-player classical game (the questions and answers are classical bits) where:
1. Questions ${ (s, t)\in\{0, 1, \ldots, n-1\}^2}$ are asked according to some distribution ${ \Pi}$ (e.g. uniform).
2. Answers ${ a, b\in\{0, 1\}}$ are provided by players (call them Alice and Bob).
3. The verifier computes a predicate ${ V(a, b|s, t) = f_{s, t}(a\oplus b)}$ used to decide acceptance/rejection.

#### Definition (CHSH Game)

An XOR game with ${ n=2}$ where ${ s, t\in\{0, 1\}}$ are independent random bits and ${ V(a, b | s, t)=1 \Longleftrightarrow a\oplus b = s\wedge t}$. To win the CHSH game, Alice and Bob need to output bits ${ a}$ (from Alice) and ${ b}$ (from Bob) that disagree if ${ s=t=1}$ and agree otherwise. If Alice and Bob are classical then they can do no better than by always outputting ${ 0}$, say, in which case they win in the three out of four cases when one of the questions is ${ 0}$. Equivalently, ${ \omega(G)=\frac34}$ where ${ G}$ is the CHSH game. This is the content of Bell’s inequality :

#### Lemma (Bell’s Inequality)

For any two functions ${ g, h: \{0, 1\}\rightarrow\{0, 1\}}$, we have ${ {\mathop{\mathbb{P}}}_{x, y\in\{0, 1\}}\left[g(x)\oplus h(y) = x\wedge y\right] \leq \frac{3}{4} }$ where ${ x}$ and ${ y}$ are independent uniformly random bits.
Proof.
The probability of any event is a multiple of ${ 1/4}$ so it suffices to show that ${ {\mathop{\mathbb{P}}}_{x, y\in\{0, 1\}}\left[g(x)\oplus h(y) = x\wedge y\right]\neq1}$. So assume for contradiction that ${ g(x)\oplus h(y) = x\wedge y}$ for all pairs ${ x,y}$. Then we have that ${ g(0)\oplus h(0) = 0}$ and ${ g(0)\oplus h(1) = 0}$ which implies that ${ h(0)=h(1)}$. But then ${ 0=g(1)\oplus h(0) =g(1)\oplus h(1) = 1}$ which is a contraction.

### The strategy

The entangled strategy for the CHSH game requires that Alice and Bob each hold a qubit, so that their two qubits together are described by a vector in ${ \mathbb C^2\otimes\mathbb C^2}$. The two qubits together are in the state ${ |{\phi}\rangle = \frac{1}{\sqrt{2}}\left( |{0}\rangle _A |{0}\rangle _B + |{1}\rangle _A |{1}\rangle _B\right), }$ forming what is known as an EPR (Einstein-Podolsky-Rosen) pair. Now for ${ \theta\in[-\pi, \pi]}$ define ${ |{\beta_0(\theta)}\rangle = \cos\theta |{0}\rangle + \sin\theta |{1}\rangle }$ and ${ |{\beta_1(\theta)}\rangle = -\sin\theta |{0}\rangle + \cos\theta |{1}\rangle }$ . Now we describe a (quantum) strategy with winning probability ${ \cos^2(\frac{\pi}{8})}$. In each case Alice and Bob respond with their measurement outcome, where subscripts of the measurement basis vectors correspond to the answer to be sent back to the verifier.
• If ${ s=0}$, Alice measures in basis ${ \{ |{\beta_0(0)}\rangle , |{\beta_1(0)}\rangle \} = \{ |{0}\rangle , |{1}\rangle \}}$. If ${ s=1}$, Alice measures in ${ \{ |{\beta_0(\frac{\pi}{4})}\rangle , |{\beta_1(\frac{\pi}{4})}\rangle \}}$. Alice answers bit ${ a = 0}$ if outcome is ${ \beta_0(\cdot)}$ and answers ${ a = 1}$ otherwise.
• If ${ t=0}$, Bob measures in basis ${ \{ |{\beta_0(\frac{\pi}{8})}\rangle , |{\beta_1(\frac{\pi}{8})}\rangle\}}$. If ${ t=1}$, Bob measures in ${ \{ |{\beta_0(-\frac{\pi}{8})}\rangle , |{\beta_1(-\frac{\pi}{8})}\rangle \}}$.
• Each player responds with their respective measurement outcome.
Lemma 2 Alice and Bob win the CHSH game with probability ${ \cos^2(\frac{\pi}{8})}$.
Proof.
We will show that for each pair of questions ${ s,t}$ the pair of answers ${ a,b}$ is correct with probability ${ \cos^2(\pi/8)}$. We can split the pairs of questions into the two cases ${ s\wedge t=0}$ and ${ s\wedge t=1}$:
• (${ s\wedge t=0}$) The three cases ${ (s,t)=(0,0),(0,1),(1,0)}$ are all analogous: in each case Alice an Bob must output the same answer, and in each case Bob’s measurement basis is almost the same as Alice’s except rotated by a small angle ${ \pm\pi/8}$. Of the three above cases we consider the one where ${ s,t=0}$ and check that indeed the two measurement outcomes agree with probability ${ \cos^2(\pi/8)}$: When Alice measures her qubit and obtains some bit ${ a}$, the shared pair ${ |{\beta}\rangle}$ collapses to ${ |{a}\rangle _A |{a}\rangle _B}$. Indeed, since the question was ${ s=0}$, Alice measures her qubit in the basis ${ \{|0\rangle,|1\rangle\}}$. This means that Alice applies the measurement ${ \{|0\rangle\langle 0|\otimes {\mathbb{I}},|1\rangle\langle 1|\otimes {\mathbb{I}}\}}$ on the global state. The post-measurement state is the normalization of ${ ( |{a}\rangle \langle{a}| \otimes {\mathbb{I}}) |{\phi}\rangle _{AB}=\frac{1}{\sqrt2} |{a}\rangle \overbrace{\langle a |{0}\rangle }^{\delta_{a,0}}\otimes | 0\rangle + |{a}\rangle \overbrace{\langle a |{1}\rangle }^{\delta_{a,1}}\otimes |{1}\rangle =\frac {1}{\sqrt2} |{a}\rangle _A |{a}\rangle _B }$ because ${ \langle a |{a'}\rangle}$ can be viewed as a Kronecker delta of ${ a}$ and ${ a'}$. In particular, Bob is now in the pure state ${ |{a}\rangle}$ . Because Bob received question ${ t=0}$ he measures in the basis ${ \{ |{\beta_b(\pi/8)}\rangle \}_b}$ Therefore his probability of correctly outputting ${ b=a}$ is ${ {\mathop{\mathbb{P}}}[\text{Bob gets outcome }a] = |\langle\beta_a(\tfrac{\pi}{8}) |{a}\rangle |^2 = \cos^2\left(\frac{\pi}{8}\right) }$
• (${ s\wedge t=1}$) Now consider the case ${ s=1,t=1}$ where Alice and Bob are supposed to give different answers. Alice measures in basis consisting of ${ |{\beta_0(\pi/4)}\rangle =\frac{|0\rangle+|1\rangle}{\sqrt2}}$ and ${ |{\beta_1(\pi/4)}\rangle =\frac{-|0\rangle+|1\rangle}{\sqrt2}}$. If Alice gets outcome ${ a}$ then the post-measurement global state is ${ |{\beta_a(\frac\pi4)}\rangle |{\beta_a(\frac\pi4)}\rangle}$ . Therefore when Bob applies the measurement in basis ${ \{ |{\beta_a(-\frac\pi8)}\rangle \}_a}$ he mistakenly outputs ${ a}$ only with probability ${ |\langle\beta_a(\frac\pi4) |{\beta_a(-\frac\pi8)}\rangle |^2=\sin^2(\frac\pi8)=1-\cos^2(\frac\pi8)}$.
Lemma 2 implies a lower bound on the value of the CHSH game.

#### Corollary

${ \omega^*(G)\ge\cos^2(\frac\pi8)}$
It turns out that this lower bound is sharp, that is, the strategy just described is optimal.

#### Lemma

The value of the CHSH game using a quantum strategy is at most ${ \frac{1}{2} + \frac{\sqrt{2}}{4} = \cos^2\frac{\pi}{8}\approx 0.85}$. Proof. We can describe the quantum strategy of Alice and Bob in an XOR game by (i) a shared quantum state ${ |{\phi}\rangle _{AB}\in\mathbb{C}^d\otimes\mathbb{C}^d}$ (note that for the CHSH game, ${ d=2}$); (ii) measurements ${ \{A^0_s, A^1_s\}}$ for every question ${ s}$ sent to Alice; (iii) measurements ${ \{B^0_t, B^1_t\}}$ for every question ${ t}$ sent to Bob. The probability of answering ${ (a, b)}$ given questions ${ (s, t)}$ is ${ \langle{\phi}| A^a_s\otimes B^b_t |{\phi}\rangle}$ . Now let us write ${ A_s=A^0_s - A^1_s}$ and ${ B_t=B^0_t - B^1_t}$ so that for any ${ a, b\in\{0, 1\}}$, we can write ${ A_s^a = \frac{{\mathbb{I}} + (-1)^aA_s}{2}, B_t^b = \frac{{\mathbb{I}} + (-1)^bB_t}{2} }$ Note that since the possible outcomes here are finite, ${ A_s, B_t}$ are Hermitian and we may assume have bounded norm of 1. Furthermore, we assume that ${ A_s, B_t}$ are observables so that ${ A_s^2 = B_t^2 = {\mathbb{I}}}$ . Now denoting ${ f_{s, t}(a\oplus b)}$ as the XOR predicate to be computed, we can write the quantum game value as $\omega^*(G) = \sup_{ |{\phi}\rangle , \{A_s^a\}, \{B_t^b\}} {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\sum_{a, b}f_{s, t}(a\oplus b) \langle{\phi}| A^a_s\otimes B^b_t |{\phi}\rangle \\ = \sup_{ |{\phi}\rangle , \{A_s\}, \{B_t\}} {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\sum_{a, b}f_{s, t}(a\oplus b) \langle{\phi}| \frac{{\mathbb{I}} + (-1)^aA_s}{2}\otimes \frac{{\mathbb{I}} + (-1)^bB_t}{2} |{\phi}\rangle \\ = {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\sum_{a, b}\frac{f_{s, t}(a\oplus b)}{4} + \sup_{ |{\phi}\rangle , \{A_s\}, \{B_t\}} {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\sum_{a, b}f_{s, t}(a\oplus b) \langle{\phi}| \frac{{\mathbb{I}} + (-1)^aA_s}{2}\otimes \frac{{\mathbb{I}} + (-1)^bB_t}{2} |{\phi}\rangle \\ = {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\sum_{a, b}\frac{f_{s, t}(a\oplus b)}{4} + \sup_{ |{\phi}\rangle , \{A_s\}, \{B_t\}} {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\sum_{a, b}\frac{f_{s, t}(a\oplus b)(-1)^{ab}}{4} \langle{\phi}| A_s\otimes B_t |{\phi}\rangle \\ = {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\sum_{a, b}\frac{f_{s, t}(a\oplus b)}{4} + \sup_{ |{\phi}\rangle , \{A_s\}, \{B_t\}} {\mathop{\mathbb{E}}}_{(s, t)\sim\Pi}\frac{f_{s, t}(0) - f_{s,t}(1)}{2} \langle{\phi}| A_s\otimes B_t |{\phi}\rangle \\$ where the summation ${ \sum_{a, b\in\{0, 1\}}\left(\cdot\right)}$ has been evaluated in the last line. Now note that the first term is independent of the quantum strategy and as a result equals the value of the uniformly random strategy which is 1/2. So we proceed to focus on the second term. Note that for CHSH ${ f_{s, t}(0) - f_{s,t}(1) = (-1)^{st}}$ simplifying the second term to ${ \frac{1}{8}( \langle{\phi}| A_0\otimes B_0 |{\phi}\rangle + \langle{\phi}| A_1\otimes B_0 |{\phi}\rangle + \langle{\phi}| A_0\otimes B_1 |{\phi}\rangle - \langle{\phi}| A_1\otimes B_1 |{\phi}\rangle ). }$ Next, we invoke Tsirelson’s Theorem (See Theorem 3) to bound this second term as $= \sup_{{\textbf{x}}_s, {\textbf{y}}_t}\frac{1}{8}({\textbf{x}}_0\cdot {\textbf{y}}_0 + {\textbf{x}}_0\cdot {\textbf{y}}_1 + {\textbf{x}}_1\cdot {\textbf{y}}_0 - {\textbf{x}}_1\cdot {\textbf{y}}_1) \\ \leq \sup_{{\textbf{x}}_s, {\textbf{y}}_t}\frac{1}{8}(\|{\textbf{x}}_0\|\|{\textbf{y}}_0 + {\textbf{y}}_1\| + \|{\textbf{x}}_1\|\|{\textbf{y}}_0 - {\textbf{y}}_1\|) \\ \leq \sup_{{\textbf{x}}_s, {\textbf{y}}_t}\frac{\sqrt{2}}{8}\sqrt{2\|{\textbf{y}}_0\|^2 + 2\|{\textbf{y}}_1\|^2} \leq \frac{\sqrt{2}}{4}$ where we have used Cauchy-Schwartz and the concavity of the ${ \sqrt{\cdot}}$ function. This completes our proof showing the exact characterization of the value (${ =\cos^2(\frac{\pi}{8})=\frac{1}{2}+\frac{\sqrt{2}}{4}}$) of the CHSH game using a quantum strategy. This proof is an adaptation of the one in [12].
Theorem 3 (Tsirelson’s Theorem [1]) For any ${ n\times n}$ matrix ${ C = (C_{s, t})}$, the following are equivalent:
1. There exist ${ d\in\mathbb{N}, |{\phi}\rangle \in\mathbb{C}^{d}\otimes\mathbb{C}^{d}, A_s, B_t\in\text{Herm}(\mathbb{C}^d), A_s^2 = B_t^2 = I}$ such that for any ${ s, t\in[n]}$ ${ C_{s, t} = \langle{\phi}| A_s\otimes B_t |{\phi}\rangle}$ . Further this would imply that ${ d\leq 2^{\lceil\frac{n+2}{2}\rceil}}$;
2. There exist real unit vectors ${ { {\textbf{x}}}_s, { {\textbf{y}}}_t\in{\mathbb{R}}^{n+2}}$ for ${ s, t\in [n]}$ such that ${ C_{s, t} = { {\textbf{x}}}_s\cdot { {\textbf{y}}}_t}$;

## Entangled unique games are easy

The CHSH game provides the first example that the entangled value ${ \omega^*(G)}$ of a nonlocal game can exceed the classical value ${ \omega(G)}$. XOR-games like the CHSH game are the special case corresponding to alphabet size ${ k=2}$ of the class of unique games :

#### Definition (Unique Games)

A 2-prover 1-round game is called a unique game if its constraints are of the form ${ b=\pi_{ij}(a)}$ where ${ \pi_{ij}}$ is a permutation of the alphabet ${ \Sigma}$ for each edge ${ i\sim j}$. The famous unique games conjecture (UGC) by Khot says that ${ \omega(G)}$ is NP-hard to approximate for unique games. Surprisingly, Kempe et al. showed that a natural semidefinite relaxation for unique games yields an approximation to the entangled value ${ \omega^*(G)}$ which can be computed in polynomial time. In other words the UGC is false for entangled provers, in contrast to the classical case where the conjecture is open.
Theorem 4 There is an efficient (classical) algorithm which takes a description of a nonlocal game ${ G}$ as its input and outputs ${ \hat\omega(G)}$ such that ${ 1-6(1-\hat\omega(G))\le\omega^*(G)\le\hat\omega(G) }$ Put differently, if ${ \omega^*(G)=1-\varepsilon^*}$ and ${ \hat\omega(G)=1-\hat\varepsilon}$, then ${ \varepsilon^*\in[\hat\varepsilon,6\hat\varepsilon] }$
The algorithm of theorem 4 proceeds by relaxing the set of quantum strategies to a larger convex set ${ \mathcal S}$ of pseudo-strategies and maximizing over ${ \mathcal S}$ instead of actual strategies, a much easier task. In approximation theory one often encounters a collection of hypothetical moments not arising from a distribution, known as a pseudo-distribution. In contrast, our pseudo-strategies are actual conditional probability distributions on answers (conditional on the questions). What makes ${ \mathcal S}$ a set of “pseudo”-strategies rather than actual strategies is that they may enjoy correlations which cannot be achieved without communication.

### Convex relaxation of quantum strategies

We will define ${ \mathcal S}$ to be a class of conditional probability distributions ${ \tilde{P}(a,b|s,t)}$ on answers given questions. We will require that the pseudo-strategies satisfy a positive semidefinite constraint when arranged in matrix form. In particular this matrix has to be symmetric, so we symmetrize the conditional probability ${ \tilde{P}(a,b|s,t)}$ by allowing each of ${ s}$ and ${ t}$ to be either a question for Alice or for Bob. That is, we extend the domain of definition for ${ \tilde{P}(a,b|s,t)}$ from ${ \Sigma^2\times S\times T}$ to ${ \Sigma^2\times (S\cup T)^2}$ where ${ S}$ and ${ T}$ are the question sets. So each question ${ s}$ and ${ t}$ and answer ${ a}$ and ${ b}$ can be either for Alice or Bob — we indicate this by changing notation from ${ (s,t)\in S\times T}$ to ${ q,q'\in S\cup T}$ and for the answers replacing ${ a,b}$ by ${ c,c'}$.
Definition 5 (Block-matrix form) Given a function ${ \tilde{P}=\tilde{P}(\cdot\cdot|\cdot\cdot)}$ defined on ${ (S\cup T)^2\times \Sigma^2}$ with ${ |S|=|T|=n}$ and ${ |\Sigma|=k}$, define a ${ 2nk}$-by-${ 2nk}$ matrix ${ M^{\tilde{P}}}$ whose rows are indexed by pairs ${ (q,c)\in(S\cup T)\times\Sigma}$ and columns by pairs ${ (q',c')\in (S\cup T)\times\Sigma}$, and whose entries are ${ M^{\tilde{P}}_{(q,c),(q',c')}=\tilde{P}(c,c'|q,q') }$ In other words ${ M^{\tilde{P}}}$ consists of ${ k}$-by-${ k}$ blocks where the block ${ M^{\tilde{P}}_{q,q'}}$ at position ${ q,q'}$ contains the entries ${ \tilde{P}(\cdot\cdot|q,q')}$.
Definition 5 is simply a convenient change of notation and we identify ${ \tilde{P}}$ with ${ M^{\tilde{P}}}$, using either notation depending on the context.
Definition 6 (Pseudo-strategies) Let ${ S}$ and ${ T}$ be the question sets for Alice and Bob, respectively. We say that ${ \tilde{P}=\tilde{P}(\cdot\cdot|\cdot\cdot):\Sigma^2\times (S\cup T)^2\rightarrow[0,1]}$ (or its matrix form ${ M^{\tilde{P}}}$) is a pseudo-strategy if:
1. ${ M^{\tilde{P}}}$ is positive semidefinite.
2. For any pair of questions ${ q,q'\in S\cup T}$, ${ \sum_{c,c'=1}^k \tilde{P}(c,c'|q,q')=1}$.
3. The blocks ${ M^{\tilde{P}}_{q,q'}}$ on the diagonal are themselves diagonal ${ k}$-by-${ k}$ matrices.
Define the winning probability or value of a pseudo-strategy as: ${ \omega^{\tilde{P}}(G)=\mathbb E_{(s,t)\sim \Pi} \sum_{a,b} \tilde{P}(a,b|s,t)V(a,b|s,t) }$ The algorithm outputs the maximum winning probability: ${ \hat\omega(G)=\max_{\tilde{P}\in\mathcal S}\omega^{\tilde{P}}(G) }$ over pseudo-strategies ${ \tilde{P}\in \mathcal S}$. This maximum is efficiently computable using standard semidefinite programming algorithms. As we will see, actual quantum strategies are in ${ \mathcal S}$ which immediately implies ${ \hat\omega(G)\ge\omega^*(G)}$. It then remains to show that the optimal pseudo-strategy can be approximated by an actual entangled strategy, thus bounding the gap from ${ \omega^*(G)}$ to ${ \hat\omega(G)}$.

### Quantum strategies are in ${ \mathcal S}$

Let us establish that ${ \mathcal S}$ is indeed a relaxation of the class of quantum strategies, that is, it contains the quantum strategies. So suppose we are given a quantum strategy. By equation 1 the probability of answers ${ a}$ and ${ b}$ given questions ${ s}$ and ${ t}$ is of the form ${ \langle\phi|A^s_a\otimes B^t_b |\phi\rangle=\langle\phi|(A^s_a\otimes {\mathbb{I}})({\mathbb{I}}\otimes B^t_b )|\phi\rangle }$ for some PVM’s ${ A^s}$ and ${ B^t}$ and some ${ |\phi\rangle\in\mathbb C^d\otimes\mathbb C^d}$. This conditional probability distibution is not immediately in the form of a pseudo-strategy because we cannot evaluate it on pairs of Alice-questions or pairs of Bob-questions. We therefore have to extend it, as follows: Place all the column vectors ${ A^s_a\otimes {\mathbb{I}}|\phi\rangle}$, ${ (s,a)\in S\times \Sigma}$ side by side, and then append the vectors ${ I\otimes B^t_b|\phi\rangle}$, resulting in a ${ d^2}$-by-${ 2nk}$ matrix ${ R}$. We then define ${ {P}(\cdot\cdot|\cdot\cdot):\Sigma^2\times(S\cup T)^2\rightarrow[0,1]}$ through its matrix form (see the comment below definition 5):

$\displaystyle M^{P}:=R^\dag R=\begin{pmatrix} (\langle\phi|A^s_a A^{s'}_{a'}\otimes {\mathbb{I}} |\phi\rangle)_{(s,a),(s',a')} (\langle\phi|A^s_a\otimes B^t_b |\phi\rangle)_{(s,a),(t,b)} \\\\ (\langle\phi|B^t_b\otimes A^s_a |\phi\rangle)_{(t,b),(s,a)} (\langle\phi|{\mathbb{I}}\otimes B^t_bB^{t'}_{b'} |\phi\rangle)_{(t,b),(t',b')} \end{pmatrix} \ \ \ \ \ (2)$

Lemma 7 ${ M^{P}}$ defined in 2 is a pseudo-strategy, that is, ${ M^{P}\in\mathcal S}$.
Proof.
We verify the conditions in definition 6. Condition 1 (${ M^{P}\succeq0}$) holds because it is of the form ${ R^\dag R}$. Condition 2 (Each block ${ M^{P}_{q,q'}}$ sums to ${ 1}$) holds because PVM’s sum to the identity. Condition 3 (Diagonal blocks ${ \tilde M=M^{P}_{qq}}$ are diagonal) holds because the projections in the PVM ${ A^q}$ are mutually orthogonal, hence ${ \langle{\phi}| A^q_cA^q_{c'} |{\phi}\rangle =0}$ if ${ c\neq c'}$.
Lemma 7 means ${ \hat\omega(G)=\max_{{P}\in \mathcal S}\omega^{P}(G)}$ is an (efficiently computable) upper bound for ${ \omega^*(G)}$:

#### Corollary

${ \hat\omega\ge \omega^*(G)}$.
To finish the proof of theorem 4 we need to show that any pseudo-strategy ${ \tilde{P}}$ can be rounded to an actual quantum strategy with answer probabilities ${ {P}}$ such that

$\displaystyle 1-\omega^{P}\le 6(1-\omega^{\tilde P}) \ \ \ \ \ (3)$

Applying this rounding to the optimal pseudo-strategy implies that ${ 1-\omega^*\le 6(1-\hat\omega) }$ or ${ \omega^*\ge 1-6(1-\hat\omega)}$, which will finish the proof of theorem 4.
Proof.
[Proof of theorem 4] Let ${ \tilde{P}\in\mathcal S}$ be a pseudo-strategy. We construct a quantum strategy ${ {P}}$ approximating ${ \tilde{P}}$. Since ${ M^{\tilde{P}}}$ is positive semidefinite we can write ${ \tilde M=R^\dag R }$ for some matrix ${ R\in \mathbb C^{r\times 2nk}}$ and ${ r\leq 2nk}$. Now let us define ${ 2nk}$ vectors ${ |\tilde u^s_a\rangle}$ and ${ |\tilde v^t_b\rangle}$ in ${ \mathbb C^{r}}$, and let ${ |{u^s_a}\rangle}$ and ${ |{v^t_b}\rangle}$ be the same vectors normalized. The strategy is constructed as follows. Alice and Bob share the maximally entangled state ${ |\phi\rangle=\frac1{\sqrt r}\sum_{i=1}^r|i\rangle|i\rangle\in\mathbb C^r\otimes\mathbb C^r }$ Before deciding on Alice and Bob’s PVM’s ${ A^s}$ and ${ B^t}$ let us see what this choice of shared state means for the conditional distribution on answers (see equation (1)).

$\langle\phi| A^s_a\otimes B^t_b|\phi\rangle =\frac1r\sum_{i,j=1}^r\langle i | A^s_a| j\rangle\langle i | B^t_b| j\rangle =\frac1r A^s_a\cdot B^t_b=\frac1r\langle A^s_a,\overline{B^t_b}\rangle$, (*)

where the bar represents entrywise complex conjugation, ${ \cdot}$ is the entrywise dot product of matrices, and ${ \langle\:,\rangle}$ the entrywise complex inner product (Hilbert-Schmidt inner product). We now choose the measurements. Given question ${ s}$, Alice measures in the PVM ${ A^s=(A^s_a)_{a=0}^k}$ with ${ A^s_a= |{u^s_a}\rangle \langle{u^s_a}| \text{ for }a=1,\ldots,k\text{, and }A^s_0={\mathbb{I}}-\sum_{i=1}^k |{u^s_a}\rangle \langle{u^s_a}| }$ Similarly, Bob on question ${ t}$ applies the PVM ${ B^t}$ with ${ B^t_b= |{v^t_b}\rangle \langle{v^t_b}| \text{ for }b=1,\ldots,k\text{, and }B^t_0={\mathbb{I}}-\sum_{i=1}^k |{v^t_b}\rangle \langle{v^t_b}| }$ The condition 3 in definition 6 ensures that for any question ${ s}$, the vectors ${ |{u^s_1}\rangle ,\ldots, |{u^s_1}\rangle}$ are orthogonal so that this is a valid PVM. The measurement outcome “$0$” is interpreted as “fail”, and upon getting this outcome the player attempts the measurement again on their share of a fresh copy of ${ |\phi\rangle_{AB}}$. This means that the strategy requires many copies of the entangled state to be shared before the game starts. It also leads to the complication of ensuring that with high probability the players measure the same number of times before outputting their measurement, so that the outputs come from measuring the same entangled state. By (*), at a given round of measurements the conditional distribution of answers is given by ${ \langle\phi|A^s_a\otimes B^t_b|\phi\rangle=\frac1r\Big\langle |{u^s_a}\rangle \langle{ u^s_a}| \:,\: |{{v^t_b}\rangle } \langle{ {v^t_b}| }\Big\rangle=\frac1r|\langle {u^s_a}|v^t_b\rangle|^2=\frac1{r|\tilde u^s_a|^2|\tilde v^t_b|^2}\Big(M^{\tilde{P}}_{(s,a),(t,b)}\Big)^2, }$ We wish to relate the LHS to ${ M^{\tilde{P}}_{(s,a),(t,b)}}$, so to handle the factor ${ \frac1r}$ each prover performs repeated measurements, each time on a fresh copy of ${ |\phi\rangle_{AB}}$, until getting an outcome ${ \neq0}$. Moreover, to handle the factor ${ \frac1{|u^s_a|^2|v^t_b|^2}}$, each prover consults public randomness and accepts the answer ${ a\in[k]}$ with probability ${ |u^a_s|^2}$ and ${ |v^b_t|^2}$ respectively, or rejects and start over depending on the public randomness. Under a few simplifying conditions (more precisely, assuming that the game is uniform meaning that an optimal strategy exists where the marginal distribution on each prover’s answers is uniform), we can let ${ M^{\tilde{P}}_{(s,a),(t,b)}\le 1/k}$ for all ${ s,a,t,b}$, and one can ensure that the conditional probabilities ${ {P}}$ of the final answers satisfy

$\displaystyle 1/k-P(a,b|s,t)\le 3\big(1/k-k(M^{\tilde P}_{(s,a),(t,b)})^2\big),\ \ \ \ \ (4)$ At this stage it is important that we are dealing with a unique game . Indeed, by (4) we have for every ${ s}$ and ${ t}$, $1-\sum_{a=1}^k P(a,\pi_{st}(a)|s,t)=\sum_a \Big(\frac{1}{k}-P(a,\pi_{st}(a)|s,t) \Big) \\ \leq 3\sum_{b=\pi_{st}(a)} \Big(\frac{1}{k}-k(M^{\tilde{P}}_{(s,a),(t,b)})^2 \Big) \\ \leq 6\sum_{b=\pi_{st}(a)}\big(\frac{1}{k}-M^{\tilde{P}}_{(s,a),(t,b)}\big)=6\Big(1-\sum_{b=\pi_{st}(a)} M^{\tilde{P}}_{(s,a),(t,b)}\Big)$ where the last inequality follows from concavity. Taking the expectation over ${ s}$ and ${ t}$ implies the bound (3), thus concluding the proof of theorem 4.

## General games are hard

We just saw that a specific class of games becomes easy in the presence of shared entanglement, in that semidefinite programming allows the entangled value ${ \omega^*}$ to be approximated to within exponential precision in polynomial time. Does this phenomenon hold more generally, so that the value of entangled games can always be efficiently approximated? We answer in the negative, by constructing a game where ${ \omega^*}$ is NP-hard to approximate to within inverse-polynomial factors. The complexity for 2P-1R entangled games can be strengthened to constant-factor NP-hardness, putting it on par with the classical PCP theorem. This result is used to prove (with some conditions) the games formulation of the quantum PCP theorem, which states that the entangled value of general games is QMA-hard to approximate within a constant factor.

### Formulation of game

Given any instance ${ \phi}$ of a ${ k}$-CSP (constraint satisfaction problem, where ${ k}$ is the number of literals), we can define a clause-vs-variable game ${ G_\phi}$ (see clause-vs-variable figure):
1. The referee (verifier) randomly sends a clause to Alice (first prover) and a variable to Bob (second prover).
2. Alice and Bob reply with assignments.
3. The referee accepts if Alice’s assignment satisfies the clause and Bob’s answer is consistent with Alice’s.
To show hardness of approximation, we need to go beyond the usual 2-player construction. In particular, in our game [3] one of the players receives an extra dummy question (see subfigure (a)). Mathematically, the result is very similar to introducing another player and having the referee play the 2-player game with two players chosen randomly [4] (see subfigure (b)). In either variation, the quantum phenomenon of monogamy of entanglement , imposing that only two parties can be maximally entangled to one another, is key to establishing hardness. The players do not know where to use their entanglement, which prevents them from coordinating as well as they could in the standard game.

### NP-hardness of approximating the entangled value

To prove hardness, we rely on several results from classical complexity theory.
Theorem 8 ([6]) Given an instance of 1-in-3 3SAT (a CSP), it is NP-hard to distinguish whether it is satisfiable or no assignments satisfy more than a constant fraction of clauses.
Theorem 9 ([2]) For a PCP game ${G}$ (emulating the CSP) and its oracularization ${G'}$ (transformation to a 2P-1R game),

$\displaystyle \omega(G)\leq \omega(G')\leq 1-\frac{1-\omega(G)}{3}\,. \ \ \ \ \ (5)$

Theorem 8 establishes the CSP variant of the classical PCP theorem: distinguishing between ${ \omega(\phi)=1}$ and ${ \omega(\phi)\leq 1/2}$ is NP-hard for some ${ k}$-CSP. Here, ${ \omega(\phi)}$ denotes the maximum fraction of clauses that are simultaneously satisfiable. Theorem 9 relates the general game ${ G}$ obtained from the CSP to a two-player one-round game ${ G'}$, in terms of the value (probability of winning) the game. The first inequality, equivalently saying ${ \omega(\phi)\leq \omega(G_\phi)}$, is achieved since the players can answer the questions in the game ${ G_\phi}$ to satisfy the clauses in ${ \phi}$. These theorems together imply that ${ \omega(G_\phi)}$ is NP-hard to approximate to within constant factors. Allowing the two players to share entanglement can increase the game value to ${ \omega^*(G_\phi)\geq \omega(G_\phi)}$. Classical results do not necessarily carry over, but exploiting monogamy of entanglement allows us to limit the power of entangled strategies. One can show the following lemma, which is weaker than what we have classically.
Lemma 10 ([3]) There exists a constant ${c>0}$ such that for a CSP ${\phi}$,

$\displaystyle \omega^*(G_\phi)\leq 1 - \frac{c(1-\omega(\phi))^2}{n^2}\,, \ \ \ \ \ (6)$

where ${n}$ is the number of variables. Combining Theorem 9 and Lemma 10, we have ${ \omega(\phi)\leq \omega^*(G_\phi)\leq 1 - \frac{c(1-\omega(\phi))^2}{n^2}\,. }$ Using Theorem 8, approximating ${ \omega^*(G_\phi)}$ is NP-hard to within inverse polynomial factors. Proving Lemma 10 takes some work in keeping track of approximations. For simplicity, we will show a less quantitative statement and indicate where the approximations come in.
Proposition 11 (adapted from [13]) ${\phi}$ is satisfiable iff ${\omega^*(G_\phi)=1}$.

### Proof of Proposition 11

The forward direction is straightforward: If ${ \phi}$ is satisfiable, then there exists a perfect winning strategy where the questions are answered according to the satisfying assignment. For the reverse direction, suppose there exists a strategy that succeeds with probability 1, specified by a shared entangled state ${ |{\psi}\rangle \in\mathbb{C}^d\otimes\mathbb{C}^d}$ and measurements ${ (A^j_{a_1,\ldots,a_k})_{a_1,\ldots,a_k}}$ for Alice and ${ (B^{t,l}_{b,b'})_{b,b'}}$ for Bob, where the questions ${ j\in[m]}$, ${ t,l\in[k]}$ and the answers ${ a,b}$ are from the CSP’s alphabet. Since one of the questions/answers for Bob corresponds to a dummy variable that is irrelevant to the game, trace over the dummy variable to define a new measurement operator ${ \tilde{B}^t_b=\frac{1}{n}\sum_{l,b'} B^{t,l}_{b,b'}}$. We can introduce a distribution on assignments to the ${ n}$ relevant variables,

$\displaystyle p(a_1,\ldots,a_n)=\lVert \mathbb{I} \otimes \tilde{B}^1_{a_1}\cdots\tilde{B}^n_{a_n} |\psi\rangle \rVert^2\,. \ \ \ \ \ (7)$

If we show that the distribution for assignments ${a_{i_1},\ldots,a_{i_k}}$ on variables ${x_{i_1},\ldots,x_{i_k}}$ in any clause ${C_j}$ is

$\displaystyle p(a_{i_1},\ldots,a_{i_k})= \lVert A^j_{a_{i_1},\ldots,a_{i_k}} \otimes \mathbb{I} |\psi\rangle \rVert^2\,, \ \ \ \ \ (8)$

then, since the players win with certainty, ${\phi}$ has a satisfying assignment. To transform Eq. 7 to Eq. 8, we need a relation between the ${A}$ and ${\tilde{B}}$ measurement operators and a way to commute the ${\tilde{B}}$ operators. The success probability of the players’ strategy is expressed as ${ P=\frac{1}{m}\sum_{j\in[m]}\frac{1}{k}\sum_{i\in C_j} \sum_{(a_1,\ldots,a_k)\vdash C_j} \langle{\psi}| A^j_{a_1,\ldots,a_k}\otimes \tilde{B}^i_{a_i} |{\psi}\rangle \,, }$ where ${ i}$ is the index of one of the ${ k}$ variables on which ${ C_j}$ acts, and ${ (a_1,\ldots,a_k)\vdash C_j}$ indicates that the assignment satisfies the clause. By positivity and summation to identity of the measurement operators ${ A}$ and ${ \tilde{B}}$, each term is at most 1; for our hypothesis ${ P=1}$, each has to be 1. Hence, using orthogonality of the vectors ${ {\mathbb{I}}\otimes\tilde{B}^i_{a_i} |{\psi}\rangle}$ for different ${ a_i}$, we have

$\displaystyle \sum_{\substack{(a_{i_1},\ldots,a_{i_k})\vdash C_j \\ a_i=b}} A^j_{a_{i_1},\ldots,a_{i_k}}\otimes \mathbb{I} |\psi\rangle = \mathbb{I} \otimes \tilde{B}^i_{a_i}|\psi\rangle\,, \ \ \ \ \ (9)$

for any ${ j\in[m]}$ and ${ i\in C_j}$. We now demonstrate that two different ${ \tilde{B}^{t_1}_{b_1}}$, ${ \tilde{B}^{t_2}_{b_2}}$ commute, so that Bob can match any satisfied clause/variable. ${\mathbb{I}} \otimes \tilde{B}^{t_1}_{b_1} \tilde{B}^{t_2}_{b_2} |{\psi}\rangle = {\mathbb{I}} \otimes (\frac{1}{n}\sum_{l_1,b_1'} B^{t_1,l_1}_{b_1,b_1'}) (\frac{1}{n}\sum_{l_2,b_2'} B^{t_2,l_2}_{b_2,b_2'}) |{\psi}\rangle \\ = {\mathbb{I}} \otimes (\frac{1}{n}\sum_{t_2,b_1'} B^{t_1,t_2}_{b_1,b_1'}) (\frac{1}{n}\sum_{t_1,b_2'} B^{t_2,t_1}_{b_2,b_2'}) |{\psi}\rangle \\ = {\mathbb{I}} \otimes \frac{1}{n^2}\sum_{t_1,t_2} B^{t_1,t_2}_{b_1,b_2} |{\psi}\rangle \\ = {\mathbb{I}} \otimes \frac{1}{n^2}\sum_{t_1,t_2} B^{t_2,t_1}_{b_2,b_1} |{\psi}\rangle \\ = {\mathbb{I}} \otimes \tilde{B}^{t_2}_{b_2} \tilde{B}^{t_1}_{b_1} |{\psi}\rangle$ In the second line, we used (9) to relate the measurements. The third line follows by the orthogonality of ${ B^{t_1,t_2}_{b_1,b_2}}$ for different ${ b_1,b_2}$. For the fourth equation, we simply swap ${ t_1,t_2}$ since the questions are indistinguishable to Bob. Thus, we can see how the dummy variable comes into play. If we had assumed ${ P=1-\epsilon}$ and kept track of approximations, we would find

$\displaystyle \frac{1}{n^2}\sum_{t_1,b_1}\sum_{t_2,b_2}\lVert \mathbb{I} \otimes (\tilde{B}^{t_1}_{b_1} \tilde{B}^{t_2}_{b_2} - \tilde{B}^{t_2}_{b_2} \tilde{B}^{t_1}_{b_1}) |\psi\rangle \rVert^2 = O(\epsilon)\,. \ \ \ \ \ (10)$

This approximate commutativity results in the hardness of approximation holding only to within inverse poly${ (n)}$ factors. Now we are ready to transform Eq. 7 to Eq. 8 to conclude the proof. $p(a_1,\ldots,a_n)=\lVert {\mathbb{I}} \otimes \tilde{B}^1_{a_1} \ldots \tilde{B}^n_{a_n} |{\psi}\rangle \rVert^2 \\ =\lVert {\mathbb{I}} \otimes \tilde{B}^{i_1}_{a_{i_1}} \ldots \tilde{B}^{i_k}_{a_{i_k}} |{\psi}\rangle \rVert^2 \\ = \lVert A^j_{a_{i_1},\ldots,a_{i_k}} \otimes {\mathbb{I}} |{\psi}\rangle \rVert^2 \\ =p(a_{i_1},\ldots,a_{i_k})\,.$ In the second line, we used (10) to commute the measurement operators, along with their properties of orthogonality and summation to identity. For the third equality, we used (9) to relate Bob’s measurements to Alice’s, along with orthogonality of ${ A^j_{a_{i_1,\ldots,i_k}}}$ for different ${ a_{i_1,\ldots,i_k}}$.

### Constant-factor NP-hardness

The weakness in the above two-player game carries over from the original three-player variant. Thus, to achieve constant-factor NP-hardness of approximation, we could start with a different multiplayer game. Vidick [11] establishes the soundness of the “plane-vs-point” low-degree test (checking that the restriction of a low-degree polynomial to a plane matches its value at some point) in the presence of shared entanglement. Soundness , in the eponymous probabilistically checkable proof (PCP) formulation of the PCP theorem, refers to the verifier accepting a wrong proof with some bounded probability; bounding with a constant maps to constant-factor hardness of approximation. Here, soundness comes from a strong bound on error accumulation, similar to our approximate commutativity, but relies on the players’ Hilbert space being decomposable into three parts (i.e., there being three players). The particular game is constructed by combining the low-degree test with the 3-SAT test (encoding satisfying assignments in a low-degree polynomial), which can be reduced to the three-player QUADEQ test (testing satisfiability of a system of quadratic equations in binary variables, which is NP-complete). By the strong soundness result, the entangled value is NP-hard to approximate to within constant factors. Natarajan et al. [7] show that soundness holds even for two players, using a semidefinite program. They then construct a two-player game in a way similar to what we demonstrated.

### Constant-factor QMA-hardness

The above can be thought of as the games formulation of the classical PCP theorem holding under shared entanglement. A true quantum PCP theorem states that the entangled value of general games is QMA-hard to approximate to within constant factors. Natarajan et al. [8] establish such a theorem, but under randomized reductions. This requirement stems from the lack of a sufficiently strong QMA-hardness result for local Hamiltonians (the quantum analog of CSPs). The soundness of the two-player low-degree test above is one instrumental component in the proof.

## How much entanglement is needed?

We now focus on the question of quantifying exactly how much entanglement is needed to play XOR games optimally. As we shall see, the answer depends on the size of the question sets posed to Alice & Bob in the game. The previous bound given by Tsirelson [10] (see table below) is tight for certain families of games, but is not tight for other families of games (such as a generalization of the CHSH game). The reason for this discrepancy is closely tied in with the the properties of the representation of the Observables that form the Optimal Strategy (${ |{\psi}\rangle , \{A_{i}\}_{i \in S}, \{B_{j}\}_{j \in T}}$). Slofstra [9] shows that if the Observables constitute a Clifford Algebra (that is, the solutions are pair-wise anti-commutative), then the strategy is minimally entangled (uses the least number of entangled bits) iff the strategy is a unique solution to the SDP rounding problem. As a trivial corollary, if the SDP rounding problem does not have a unique solution (and a correspondingly unique strategy), then there exists a Non-Clifford optimal strategy that uses (atleast) ${ |T|}$ bits of entanglement less than the Clifford strategy. Slofstra further states that minimally entangled ${ \epsilon}$-optimal strategies may be constructed for XOR games where the optimal strategies have ‘stable’ representations. For the purposes of this post, we will analyze the exact result and merely state the approximate result.

### Main Results

#### EXACT

For the exact realm, the table below summarizes Slofstra and Tsirelson’s main results.
Person Strategy Bound(entangled bits)
Slofstra (Possibly) Non-Clifford ${ \log_{2}(N)}$
Tsirelson Clifford ${ \left \lfloor{\frac{r}{2}}\right \rfloor}$
Here, ${ r}$ is the largest integer such that ${ \binom{r + 1}{2} < |S| + |T|}$ and corresponds to the maximum-rank of an extremal point in the quantum correlation matrix corresponding to an optimal strategy. ${ N}$ is the minimum dimension of the representations of the Operators (${ \{B_{j}\}}$).

#### APPROXIMATE

In the approximate realm, the minimum entanglement dimension of the representation of the Operators from an ${ \epsilon}$-Optimal Strategy is: min(${ \mathcal{O}(\epsilon^{\frac{-1}{12}}), 2^{\left \lfloor{\frac{r}{2}}\right \rfloor}}$). As we shall see, Slofstra’s theorem allows us to recover Tsirelson’s bound easily by using a fact from Representation Theory about the irreducible representations of Clifford Algebras, but stands as a more general lower bound for solutions that aren’t Clifford.

### Marginals and Solution Algebras

We’ll begin by introducing 3 key ideas: i) Degeneracy ${ \leftrightarrow}$ Non-Degeneracy ii) Existence of Marginals iii) Solution Algebra Once these ideas are defined and their notions made clear, we will be in a position to state the main result and sketch a proof for it.

#### Definition (Marginal Strategy)

Given an Optimal Quantum Strategy (${ |{\psi}\rangle , \{A_{i}\}_{i \in S}, \{B_{j}\}_{j \in T}}$), a marginal constitutes ${ \{B_{j}\}_{j \in T}}$, and the partial trace of ${ \psi}$ with respect to ${ H_{A}}$ (${ \rho_{B}}$). It is also possible to dualize the definition for obtaining ${ \{A_{i}\}_{i \in S}}$ and ${ \rho_{A}}$. We now define the notion of degeneracy, which is critical when proving the main theorem. The main point to drive home is that a degenerate optimal quantum strategy can be reduced to a unique, non-degenerate optimal quantum strategy.

#### Definition (Degenerate Quantum Strategy)

A quantum strategy (${ |{\psi}\rangle , \{A_{i}\}_{i \in S}, \{B_{j}\}_{j \in T}}$) is said to be degenerate if ${ \exists (P \in H_A)}$, ${ (Q \in H_B)}$ such that: i) ${ P}$ commutes with all ${ A_i}$ and ${ (P \otimes {\mathbb{I}}) |{\psi}\rangle = |{\psi}\rangle}$ ii) ${ Q}$ commutes with all ${ B_j}$ and ${ ({\mathbb{I}} \otimes Q) |{\psi}\rangle = |{\psi}\rangle}$ Since we can efficiently construct for any degenerate Optimal Quantum Strategy a unique, non-degenerate Optimal Quantum Strategy, we will now assume WLOG that every Optimal Quantum Strategy is non-degenerate (and unique). We now define the (unique) existence of marginal biases, which correspond to constants for the rows of the quantum correlation matrix (which is a generalization of the classical pay-off). An equivalent statement can be made for columns (${ d_{j}}$) by dualizing the existence of row marginals. These constants can be thought of as representing the (expected) optimum-payoff possible for a set of operator choices by one player, given that the other player’s choice is fixed. Intuitively, this can be seen as “collapsing” the quantum correlation matrix into a column, by summing over the rows (or collapsing into a row, by summing over the columns).
Lemma 12 (Existence of Marginals) For all ${m \times n}$ XOR games G, ${\exists \{c_{i} \geq 0 \hspace{1mm} | \hspace{1mm} i \in |S|\}}$, such that, if ${\{u_{i}\}_{i \in |S|}, \{v_{j}\}_{j \in |T|}}$ form an ${\epsilon}$-optimal vector strategy where ${\epsilon \leq \frac{1}{4(m+n)}}$,

$\displaystyle \|\sum_{j\in|T|}G_{ij}v_{j} - c_{i}u_{i}\| \leq \sqrt{10}(m + n)^{\frac{1}{4}}\epsilon^{\frac{1}{4}}, \forall i \ \ \ \ \ (11)$

If ${ \epsilon = 0}$ and our strategy is perfectly optimal, we recover an exact estimation of the marginal biases:

$\displaystyle \sum_{j\in|T|}G_{ij}v_{j} = c_{i}u_{i}, \forall i \ \ \ \ \ (12)$

The proof for the above lemma provided by Slofstra relies on using techniques to analyze the structure of the SDP program that pertains to quantum marginals. In particular, conducting trace analysis on SDP matrices that correspond to using the game matrix as off-diagonal elements leads us to the construction of the desired marginal biases. It is also critical to note that a dual statement allows us to recover the column biases ${ d_{j} \geq 0}$:

$\displaystyle \sum_{i\in[|S|]}G_{ij}u_{i} = d_{j}v_{j}, \forall j \ \ \ \ \ (13)$

We now move on to defining the notion of a solution algebra.
Definition 13 (Solution Algebra) A solution algebra ${\mathcal{A}}$ consists of self-adjoint (Hermitian) operators ${X_{j}}$ that satisfy the following predicates:

$\displaystyle X_{j}^{2} = \mathbb{I}, \forall 1 \leq j \leq n \ \ \ \ \ (14)$

$\displaystyle (\sum_{j\in[|T|]}G_{ij}X_{j})^{2} = (c_{i})^{2}\cdot\mathbb{I}, \forall i \ \ \ \ \ (15)$

The definition above merely enforces the property that our unknown marginal operators be Hermitian (14) and that they respect the optimal marginal biases (or payoffs) (15) we saw in (11), so that they correspond to being constructed from an optimal vector strategy. These unknown operators will be mapped to operators that are the marginal strategy corresponding to the optimal quantum strategy. This is at the heart of the main theorem we will now state: Theorem 14 (Slofstra, 2010) Given a ${m \times n}$ XOR game G (with no zero rows or columns) and a solution algebra ${\mathcal{A}}$, a collection of Linear Operators ${\{B_{j}\}}$ and density matrix ${\rho}$ are the marginal of an optimal strategy iff the map ${X_{j} \rightarrow B_{j}}$ induces a density-matrix representation of ${\mathcal{A}}$ and ${\rho}$ commutes with ${im(\mathcal{A})}$. Put simply, the theorem states that our unknown self-adjoint operators map to an optimal marginal strategy iff the density matrix (traced from the joint Hilbert-Space) commutes with all the mapped operators (${ \{B_{j}\}}$). The result we desire on the lower bound for the number of entangled bits, given a mapping from these indeterminate operators to the marginal of an optimal strategy, comes from a corollary to (14).
Corollary 15 Given a ${m \times n}$ XOR game G (with no zero rows or columns) and a solution algebra ${\mathcal{A}}$ with minimum dimension ${N}$ among non-zero representations, the strategy for minimum entanglement uses ${\log_{2}(N)}$ entangled bits. The proof for this corollary follows from the eigenspace decomposition of the joint Hilbert Space ${ H}$ in terms of ${ \rho}$, which is preserved by the action of ${ im(\mathcal{A})}$. As a result, each eigenspace decomposes into a finite sum of irreducible representations of ${ \mathcal{A}}$. The minimum entanglement is realized when there is exactly one invariant subspace (with one irreducible representation). The entanglement used by such a representation is ${ \log_{2}(\text{dim}\,H)}$.

### Proof of Theorem 20

The rest of the section is dedicated to sketching a brief (but formal) proof for Theorem (14), and then using a simple fact about the representations of a Clifford Algebra to show how Slofstra’s result subsumes Tsirelson’s bound. For this section, ${ |{\psi}\rangle}$ refers to an arbitrary state in ${ H = H_{A} \otimes H_{B}}$ (the joint Hilbert space). We can write ${ |{\psi}\rangle = \sum_{i} |{i}\rangle \lambda |{i}\rangle}$ over some basis ${ \{{i}\}}$, where ${ \lambda}$ is a linear map. Then, the partial trace of ${ \psi}$ over ${ H_{A}}$ is given by ${ \rho = \lambda\lambda^{*}}$. Let ${ \mathcal{B}_{A}, \mathcal{B}_{B}}$ denote the algebra generated by ${ A_{1},..,A_{m}}$ and ${ B_{1},..,B_{n}}$. Here, the generating elements are the observables of an optimal quantum strategy. To arrive at a proof for the theorem, we will rely on 2 additional lemmas which we will not prove but state.
Lemma 16 Given Hermitian operators ${A}$, ${B \in H_{A}, H_{B}}$,

$\displaystyle \|(A \otimes \mathbb{I} - \mathbb{I} \otimes B)|\psi\rangle\| \leq \|\lambda\overline{A} - B\lambda\|_{F} \ \ \ \ \ (16)$

This allows us to conclude that,

$\displaystyle \|(A \otimes \mathbb{I} - \mathbb{I} \otimes B)|\psi\rangle\| \leq \epsilon \implies \|\rho(B) - B\rho\|_{F} \leq 2\epsilon \ \ \ \ \ (17)$

Lemma 17 The optimal strategy in question is non-degenerate iff

$\displaystyle closure(\mathcal{B}_{B}\lambda H_{A}) = closure(\mathcal{B}_{A}\lambda^{*}H_{B}). \\ \ \ \ \ \ (18)$

As a special case:

$\displaystyle closure(\mathcal{B}_{B}\lambda H_{A}) = H_{B} \leftrightarrow closure(\rho H_{B}) = H_{B} \ \ \ \ \ (19)$

Forward direction : We use the first lemma to prove commutativity of $\rho$ with all $B_{j}$, and we use the second lemma to show that the closure of $\rho$ is ${ H_{B}}$. We first show the forward direction: Suppose we are given an optimal quantum strategy ($|{\psi}\rangle , \{A_{i}\}_{i \in S}, \{B_{j}\}_{j \in T}$) for a game ${ G}$. Then, we fix our optimal vector strategy as:

$\displaystyle u_{i} = (A_{i} \otimes \mathbb{I})|\psi\rangle \ \ \ \ \ (20)$

$\displaystyle v_{j} = (\mathbb{I} \otimes B_{j})|\psi\rangle \ \ \ \ \ (21)$

We can now use Equations (11) and (13) to establish our optimal marginal biases to write a relationship between them and ${ |{\psi}\rangle}$ , and apply Lemma (16) to show commutativity and Lemma (17) to show that ${ im(\mathcal{A})}$ = cyclic(${ B_{j}, \rho}$). Using (13) with (20) and (21), we have:

$\displaystyle d_{j}(\mathbb{I}\otimes B_{j})|\psi\rangle = \sum_{i}G_{ij}(A_{i}\otimes\mathbb{I})|\psi\rangle \ \ \ \ \ (22)$

We can now use (17) with ${ \epsilon = 0}$ on (22) to see that ${ \rho}$ commutes with every ${ B_{j}}$. Additionally, as the terms in (22) constitute linear combinations of ${ A_{i}}$ and ${ B_{j}}$, we can compute the closure of their actions on ${ \lambda H_{A}}$ and ${ \lambda^{*} H_{B}}$, which will be equivalent. Therefore, ${ im(\rho)}$ = ${ H_{B}}$, which follows from the special case of (19). For the dual case, we substitute (20) and (21) into (12):

$\displaystyle c_{i}(A_{i}\otimes\mathbb{I})|\psi\rangle = \sum_{j}G_{ij}(\mathbb{I}\otimes B_{j})|\psi\rangle$

On taking the norm of the above on both sides and using a little algebra, we finally obtain the fact that ${ \{B_{j}\}}$ satisfy predicate (15) making them the representations of ${ X_{j}}$: ${ (\sum_{j}G_{ij}B_{j})^{2} = c_{i}^{2}{\mathbb{I}} }$ This shows that the map from ${ X_{j} \rightarrow B_{j}}$ computes a density matrix representation of ${ \mathcal{A}}$, where ${ \rho}$ commutes with all ${ B_{j}}$.
Backward Direction : The proof for the backward direction is much less involved: If we knew that ${ \{B_{j}\}, \rho}$ constituted the cyclic representation of ${ \mathcal{A}}$ with commutativity (with ${ B_{j}}$), then we can use Lemma (19) to conclude that the image of ${ \rho}$ on ${ H}$ would form a subspace of ${ \lambda H}$. We define: ${ \overline{A}_{i} = \frac{\sum_{j}G_{ij}B_{j}}{c_{i}} }$ allowing us to recover our original marginal biases ${ c_{i}}$ that satisfy (23) and therefore correspond to the optimal strategy. This shows us that ${ \{A_{i}\}}$, ${ \{B_{j}\}}$ would constitute an optimal quantum strategy.
Having proved this theorem, we now obtain Corollary 15, which is the main desired result. To see how it subsumes Tsirelson’s result as a special case, we use a simple fact from Representation Theory:
Lemma 18 For a Clifford Algebra generated by ${X_{1},..,X_{r}}$, there exist one or two irreducible representations of dimension ${2^{\left \lfloor{\frac{r}{2}}\right \rfloor}}$ Plugging Lemma 18 into Corollary 15, we simply recover the fact that the number of entangled bits of a solution algebra that is Clifford is ${ \left \lfloor{\frac{r}{2}}\right \rfloor}$. However, note that being Clifford means an extra constraint: ${ X_{i}X_{j} = -X_{j}X_{i}}$, ${ \forall i, j}$ The constraints on the Solution Algebra (14), (15) given by Slofstra do \textit{not} necessarily mean that the solution is Clifford. In fact, when an optimal quantum strategy with minimal entanglement is Clifford, ${ \{A_{i}\}}$, ${ \{B_{j}\}}$ are constructed from a unique set of ${ \{u_{i}\}}$, ${ \{v_{j}\}}$. To end, we write down a lemma that shows there exist XOR games where the optimal strategy is not unique and for minimal entanglement, a solution generated by a Non-Clifford algebra must be used:

#### Lemma (Existence of XOR games with Non-Clifford optimal strategies)

There exist a family of ${ m \times n}$ XOR games ${ \{G\}}$ that correspond to generalizations of the CHSH games (${CL_{n}}$), such that, the optimal strategy of minimal entanglement is Non-Clifford.

## References

[1] David Avis, Sonoko Moriyama, and Masaki Owari. From bell inequalities to tsirelson’s theorem. IEICE Transactions, 92-A(5):1254–1267, 2009.

[2] Lance Fortnow, John Rompel, and Michael Sipser. On the power of multi-prover interactive protocols. Theoretical Computer Science, 134(2):545 – 557, 1994.

[3] T. Ito, H. Kobayashi, and K. Matsumoto. Oracularization and Two-Prover One-Round Interactive Proofs against Nonlocal Strategies. ArXiv e-prints, October 2008.

[4] J. Kempe, H. Kobayashi, K. Matsumoto, B. Toner, and T. Vidick. Entangled games are hard to approximate. ArXiv e-prints, April 2007.

[5] Julia Kempe, Oded Regev, and Ben Toner. Unique games with entangled provers are easy. SIAM Journal on Computing, 39(7):3207– 3229, 2010.

[6] S. Khanna, M. Sudan, L. Trevisan, and D. Williamson. The approximability of constraint satisfaction problems. SIAM Journal on Computing, 30(6):1863–1920, 2001.

[7] Anand Natarajan and Thomas Vidick. Two-player entangled games are NP-hard. arXiv e-prints, page arXiv:1710.03062, October 2017.

[8] Anand Natarajan and Thomas Vidick. Low-degree testing for quantum states, and a quantum entangled games PCP for QMA. arXiv e-prints, page arXiv:1801.03821, January 2018.

[9] William Slofstra. Lower bounds on the entanglement needed to play xor non-local games. CoRR, abs/1007.2248, 2010.

[10] B.S. Tsirelson. Quantum analogues of the bell inequalities. the case of two spatially separated domains. Journal of Soviet Mathematics, 36(4):557–570, 1987.

[11] Thomas Vidick. Three-player entangled XOR games are NP-hard to approximate. arXiv e-prints, page arXiv:1302.1242, February 2013.

[12] Thomas Vidick. Cs286.2 lecture 15: Tsirelson’s characterization of xor games. Online, December 2014. Lecture Notes.

[13] Thomas Vidick. Cs286.2 lecture 17: Np-hardness of computing $\omega^*(G)$. Online, December 2014. Lecture Notes.

author: Beatrice Nash

Abstract

In this blog post, we give a broad overview of quantum walks and some quantum walks-based algorithms, including traversal of the glued trees graph, search, and element distinctness [3; 7; 1]. Quantum walks can be viewed as a model for quantum computation, providing an advantage over classical and other non-quantum walks based algorithms for certain applications.

# Continuous time quantum walks

We begin our discussion of quantum walks by introducing the quantum analog of the continuous random walk. First, we review the behavior of the classical continuous random walk in order to develop the definition of the continuous quantum walk.

Take a graph $G$ with vertices $V$ and edges $E$. The adjacency matrix $A$ of $G$ is defined as follows:

$A_{i,j} = \begin{cases} 1 \quad &\text{if } (i,j) \in E \\ 0 \quad &\text{otherwise}. \end{cases}$

And the Laplacian $L$ is given by:

$L_{i,j} = \begin{cases} -\text{degree}(i) \quad &\text{if } i = j \\ 1 \quad &\text{if } (i,j) \in E \\ 0 \quad &\text{otherwise}. \end{cases}$

The Laplacian determines the behavior of the classical continuous random walk, which is described by a length $|V|$ vector of probabilities, p(t). The $i$th entry of p(t) represents the probability of being at vertex $i$ at time $t$. p(t) is given by the following differential equation:

\begin{aligned} \frac{\text{d}}{\text{dt}} \text{p}_{i}(\text{t}) = \underset{(i,j) \in E}{\sum} L_{i,j} \text{p}_{j}(\text{t}),\end{aligned}

which gives the solution $\textbf{p}(t) = e^{Lt}\textbf{p}(0)$.

Recalling the Schrödinger equation $i \frac{\text{d}}{\text{dt}} \left|\psi\right>= H \left|\psi\right>$, one can see that by inserting a factor of $i$ on the left hand side of the equation for p(t) above, the Laplacian can be treated as a Hamiltonian. One can see that the Laplacian preserves the normalization of the state of the system. Then, the solution to the differential equation:

\begin{aligned} i \frac{\text{d}}{\text{dt}} \psi_{i}(\text{t}) = \underset{(i,j) \in E}{\sum} L_{i,j} \psi_{j}(\text{t})\end{aligned},

which is $\left|\psi(t)\right> = e^{-iLt} \left|\psi(0)\right>$, determines the behavior of the quantum analog of the continuous random walk defined previously. A general quantum walk does not necessarily have to be defined by the Laplacian; it can be defined by any operator which “respects the structure of the graph,” that is, only allows transitions to between neighboring vertices in the graph or remain stationary [7]. To get a sense of how the behavior of the quantum walk differs from the classical one, we first discuss the example of the continuous time quantum walk on the line, before moving on to the discrete case.

## Continuous time quantum walk on the line

An important example of the continuous time quantum walk is that defined on the infinite line. The eigenstates of the Laplacian operator for the graph representing the infinite line are the momentum states with eigenvalues $2(\text{cos}(p) - 1)$, for $p$ in range $[-\pi,\pi]$. This can be seen by representing the momentum states in terms of the position states and applying the Laplacian operator:

\begin{aligned} \left|p\right> &= \underset{x}{\sum} e^{ipx} \left|x\right> \\ L\left|p\right> &= \underset{x}{\sum} e^{ipx} \left|x+1\right>+ e^{ipx} \left|x-1\right> - 2e^{ipx} \left|x\right> \\ &= \underset{x}{\sum} (e^{ip} + e^{-ip} - 2) e^{ipx} \left|x\right> \\ &= 2(\text{cos}(p) - 1) \left|p\right>.\end{aligned}

Hence the probability distribution at time $t$, $p(x,t) = |\left< x\right| e^{-iLt} \left|\psi(0)\right> | ^{2}$, with initial position $\left|\psi(0)\right> = \left|0\right>$ is given by:

\begin{aligned} |\left< x\right| e^{-iLt} \left|0\right> | ^{2} &= \bigg| \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{-2it(\text{cos}p - 1)} \left< x|p\right> \text{d}p \bigg|^{2} \\ &= \bigg| \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{-2it(\text{cos}p - 1)} e^{ipx} \text{d}p \bigg|^{2} \\ &= | J_{x}(2t) |^{2}.\end{aligned}

While the probability distribution for the classical continuous time
random walk on the same graph approaches, for large $t$, $\frac{1}{\sqrt{4\pi t}} e^{\frac{-x^{2}}{4t}}$, or a Gaussian of width $2\sqrt{t}$. One can see that the quantum walk has its largest peaks at the extrema, with oscillations in between that decrease in amplitude as one approaches the starting position at $x=0$. This is due to the destructive interference between states of different phases that does not occur in the classical case. The probability distribution of the classical walk, on the other hand, has no oscillations and instead a single peak centered at $x=0$, which widens and flattens as $t$ increases.

## Walk on the glued trees graph

A glued tree is a graph obtained by taking two binary trees of equal height and connecting each of the leaves of one of the trees to exactly two leaves of the other tree so that each node that was a leaf in one of the original trees now has degree exactly $3$. An example of such a graph is shown in Figure 2.

The time for the quantum walk on this graph to reach the right root from the left one is exponentially faster than in the classical case. Consider the classical random walk on this graph. While in the left tree, the probability of transitioning to a node in the level one to the right is twice that of transitioning to a node in the level one to the left. However, while in the right tree, the opposite is true. Therefore, one can see that in the middle of the graph, the walk will get lost, as, locally, there is no way to determine which node is part of which tree. It will instead get stuck in the cycles of identical nodes and will have exponentially small probability of reaching the right node.

To construct a continuous time quantum walk on this graph, we consider the graph in terms of columns. One can visualize the columns of Figure 2 as consisting of all the nodes equidistant from the entrance and exit nodes. If each tree is height $n$, then we label the columns $0,1,\text{...},2n,2n+1$, where column $i$ contains the nodes with shortest path of length $i$ from the leftmost root node. We describe the state of each column as a superposition of the states of each node in that column. The number of nodes in column $i$, $N_{i}$, will be $2^{i}$ for $i \in [0,n]$ and $2^{2n+1-i}$ for $i \in [n+1,2n+1]$. Then, we can define the state $\left|\text{col} \; i\right>$ as:

\begin{aligned} \left|\text{col} \; i\right> = \frac{1}{\sqrt{N_{i}}} \underset{j \in \text{col} \; i}{\sum} \left|j\right>.\end{aligned}

The factor of $\frac{1}{\sqrt{N_{i}}}$latex ensures that the state is normalized. Since the adjacency matrix $A$ of the glued tree is Hermitian, then we can treat $A$ as the Hamiltonian of the system determining the behavior of the quantum walk. By acting on this state with the adjacency matrix operator $A$, we get the result (for $i \in [1,n-1]$):

\begin{aligned} A\left|\text{col} \; i\right> &= 2\frac{\sqrt{N_{i-1}}}{\sqrt{N_{i}}} \left|\text{col} \; i-1\right> + \frac{\sqrt{N_{i+1}}}{\sqrt{N_{i}}} \left|\text{col} \; i+1\right> \\ &= \sqrt{2} \left|\text{col} \; i-1\right> + \sqrt{2} \left|\text{col} \; i+1\right>.\end{aligned}

Then for $i \in [n+2,2n]$, we get the same result, because of symmetry.

For $i = n$:

\begin{aligned} A\left|\text{col} \; n\right> = \sqrt{2} \left|\text{col} \;n-1\right> + 2 \left|\text{col} \; n\right>.\end{aligned}

The case of $i = n+1$ is symmetric. One can see that the walk on this graph is equivalent to the quantum walk on the finite line with nodes corresponding to the columns. All of the edges, excluding that between columns $n$ and $n+1$, have weight $\sqrt{2}$. The edge between column $n$ and $n+1$ has weight $2$.

The probability distribution of the quantum walk on this line can be roughly approximated using the infinite line. In the case of the infinite line, the probability distribution can be seen as a wave propagating with speed linear in the time $t$. Thus, in time linear in $n$, the probability that the state is measured at distance $2n+1$ from the starting state is $\frac{1}{\text{poly} \; n}$. In [3] it is shown that the fact that the line is finite and has a single differently weighted edge from the others (that between $n$ and $n+1$) does not change the fact that in polynomial time, the quantum walk will travel from the left root node to the right one, although in this case there is no limiting distribution as the peaks oscillate. This was the first result that gives an exponential speed up over the classical case using quantum walks.

# Discrete time quantum walks

In this section, we will first give an introduction to the discrete quantum walk, including the discrete quantum walk on the line and the Markov chain quantum walk, as defined in [7]. Next, we discuss how Grover search can be viewed as a quantum walk algorithm, which leads us into Ambainis’s quantum-walks based algorithm from [1] for the element distinctness problem, which gives a speed up over classical and other quantum non-walks based algorithms.

The discrete time quantum walk is defined by two operators: the coin flip operator, and the shift operator. The coin flip operator $C$ determines the direction of the walk, while the shift operator $S$ makes the transition to the new state conditioned on the result of the coin flip. The Hilbert space governing the walk is $\mathcal{H} = \mathcal{H}{C} \otimes \mathcal{H}{S}$, where $\mathcal{H}{C}$ corresponds to the space associated with the result of the coin flip operator, and $\mathcal{H}{S}$ corresponds to the locations in the graph on which the walk is defined.

For example, consider the discrete time walk on the infinite line. Since there are two possible directions (left or right), then the Hilbert space associated with the coin flip operator is two dimensional. In the unbiased case, the coin flip is the Hadamard operator,

\begin{aligned} H = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix},\end{aligned}

and shift operator that produces the transition from state $\left|j\right>$ to $\left|j+1\right>$ or $\left|j-1\right>$,
conditioned on the result of the coin flip, is $S = \left|0\right>\left< 0\right| \otimes \underset{j}{\sum} \left|j+1\right> \left< j\right| + \left|1\right>\left< 1\right| \otimes \underset{j}{\sum} \left|j - 1\right> \left< j\right|$.

Each step of the walk is determined by an application of the unitary
operator $U = S \cdot (H \otimes I)$. If the walk starts at position
$\left|x\right>$, then measuring the state after one application of $U$ gives $\left|x+1\right>$ with probability $\frac{1}{2}$ and $\left|x-1\right>$ with probability $\frac{1}{2}$. This is exactly the same as the case of the classical random walk on the infinite line; the difference between the two walks becomes apparent after a few steps.

For example, the result of the walk starting at state $\left|\psi(0)\right> = \left|0\right>\left|0\right>$ after 4 steps gives:

\begin{aligned} \left|\psi(1)\right> &= \frac{1}{\sqrt{2}}\left( \left|0\right>\left|1\right> + \left|1\right>\left|-1\right> \right) \\ \left|\psi(2)\right> &= \frac{1}{2}\left( \left|0\right>\left|2\right> + \left|1\right>\left|0\right> + \left|0\right>\left|0\right> - \left|1\right>\left|-2\right> \right) \\ \left|\psi(3)\right> &= \frac{1}{2\sqrt{2}}\left(\left|0\right>\left|3\right> + \left|1\right>\left|1\right> + 2\left|0\right>\left|1\right> -\left|0\right>\left|-1\right> + \left|1\right>\left|-3\right> \right) \\ \left|\psi(4)\right> &= \frac{1}{4} (\left|0\right>\left|4\right> + \left|1\right>\left|2\right> + 3\left|0\right>\left|2\right> + \left|1\right>\left|0\right> -\left|0\right>\left|0\right> -\left|1\right>\left|-2\right> +\left|0\right>\left|-2\right>-\left|1\right>\left|-4\right>).\end{aligned}

One can see that the distribution is becoming increasingly skewed
towards the right, while in the classical case the distribution will be
symmetric around the starting position. This is due to the destructive
interference discussed earlier. The distribution after $t = 20$ time
steps is shown in Figure 4.

Now, consider the walk starting at state $\left|\psi(0)\right> = -\left|1\right>\left|0\right>$:

\begin{aligned}\left|\psi(1)\right> &= \frac{1}{\sqrt{2}}\left( -\left|0\right>\left|1\right> + \left|1\right>\left|-1\right> \right) \\ \left|\psi(2)\right> &= \frac{1}{2}\left( -\left|0\right>\left|2\right> - \left|1\right>\left|0\right> + \left|0\right>\left|0\right> - \left|1\right>\left|-2\right> \right) \\ \left|\psi(3)\right> &= \frac{1}{2\sqrt{2}}\left(-\left|0\right>\left|3\right> - \left|1\right>\left|1\right> + 2\left|1\right>\left|-1\right> - \left|0\right>\left|-1\right> + \left|1\right>\left|-3\right> \right) \\ \left|\psi(4)\right> &= \frac{1}{4} (-\left|0\right>\left|4\right> -\left|1\right>\left|2\right> - \left|0\right>\left|2\right> + \left|1\right>\left|0\right> + \left|0\right>\left|0\right> -3\left|1\right>\left|-2\right> + \left|0\right>\left|-2\right> - \left|1\right>\left|-4\right>).\end{aligned}

This distribution given by this walk is the mirror image of the first.
To generate a symmetric distribution, consider the start state $\left|\psi(0)\right> = \frac{1}{\sqrt{2}}(\left|0\right> -i\left|1\right>)\left|0\right>$. The resulting distribution after $t$ steps will be $p(x,t) = \frac{1}{2} p_{0}(x,t) + \frac{1}{2} p_{1}(x,t)$, where $p_{0}(x,t)$ is the probability distribution after $t$ steps resulting from the start state $\psi(0) = \left|0\right>\left|0\right>$ and $p_{1}(x,t)$ is the probability distribution after $t$ steps resulting from the start state $\psi(0) = -\left|1\right>\left|0\right>$. The result will be symmetric, with peaks near the extrema, as we saw in the continuous case.

## Markov chain quantum walk

A reversible, ergodic Markov chain with $n$ states can be represented by a $n \times n$ transition matrix $P$ with $P_{j,i}$ equal to the probability of transitioning from state $i$ to state $j$ and $P = P^{*}$. Then, $p_{0}P$, where $p_{0}$ is an initial probability distribution over the states, gives the distribution after one step.
Since $\sum_{j} P_{i,j} = 1$ for all $i$, $P$ is stochastic and thus preserves normalization.

There are multiple ways to define a discrete quantum walk, depending on the properties of the transition matrix and the graph on which it is defined (overview provided in [4]). Here we look at the quantum walk on a Markov chain as given in [2]. For the quantum walk on this graph, we define state $\left|i\right>\left|j\right>$ as the state that represents currently being at position $j$ and facing in the direction of $i$. Then, we define the state $\left|\psi_{j}\right>$ as a superposition of the states associated with position $j$:

\begin{aligned} \left|\psi_{i}\right> = \underset{j}{\sum} \sqrt{P_{j,i}} \left|i\right>\left|j\right>.\end{aligned}

The unitary operator,

$D = 2 \underset{i}{\sum} \left|\psi_{i}\right>\left< \psi_{i}\right| - I$,

acts as a coin flip for the walk on this graph. Since $P$ is reversible, we can let the shift operator be the unitary $SWAP$ operator:

$SWAP = \underset{i,j}{\sum} \left|i,j\right>\left< j,i\right|$.

A quantum walk can also be defined for a non-reversible Markov chain using a pair of reflection operators (the coin flip operator is an example of a reflection operator). This corresponds to the construction given in [7].

## Search as a quantum walk algorithm

Given a black box function $f$ and a set of inputs $S$ with $|S| = N$, say we want to find whether an input $x \in S$ exists for which $f(x)$ equals some output value. We refer to the set of inputs $M$ for which this is true as marked. Classically, this requires $O(N/|M|)$ queries, for nonempty $M$. Using the Grover search algorithm, this problem requires $O(\sqrt{N/|M|})$ quantum queries. In this section, we give a quantum walks based algorithm that also solves this problem in $O(\sqrt{N/|M|})$ time. If we define a doubly stochastic matrix $P$ with uniform transitions, then we can construct a new transition matrix $P'$ from $P$ as:

$P_{i,j}' = \begin{cases} \frac{1}{N-1} \quad &\text{if } i \neq j \text{ and } i \notin M \\0 \quad &\text{if } i = j \text{ and } i \notin M \\ 1 \quad &\text{if } i = j \text{ and } i \in M \\ 0 \quad &\text{if } i \neq j \text{ and } i \in M. \end{cases}$

Then, when the state of the first register is unmarked, the operator $D$ defined in the previous section acts as a diffusion over its neighbors. When the state in the first register is marked, then $D$ will act as the operator $-I$, and the walk stops, as a marked state has been reached. This requires two queries to the black box function: one to check whether the input is marked, and then another to uncompute. By rearranging the order of the columns in $P$ so that the columns corresponding to the non-marked elements come before the columns corresponding to the marked elements, we get:

\begin{aligned} P' = \begin{pmatrix} P_{0} & 0 \\ P_{1} & I \end{pmatrix},\end{aligned}

where $P_{0}$ gives the transitions between non-marked elements and $P_{1}$ gives the transitions from non-marked to marked elements.

We now look at the hitting time of the classical random walk. Assume
that there is zero probability of starting at a marked vertex. Then, we
can write the starting distribution $p$, where the last $|M|$ elements of $p$, corresponding to the marked elements, are zero, as
$p = \underset{\lambda}{\sum} \alpha_{\lambda} \left|\lambda\right>$, where $\lambda$ are the eigenvalues of $P_{0}$, and $\left|\lambda\right>$ are the corresponding eigenvectors, with the last $|M|$ entries zero. Let $\lambda^{}$ be the principal (largest) eigenvalue. Then, the probability that, after $t$ steps, a marked element has not yet been reached will be $\sum (P_{0}^{t}p)_{i} \leq \lambda^{*t}$. Then, the
probability that a marked element has been reached in that time will be
$\geq 1 - \lambda^{t} \geq 1 - t \lambda^{*}$. Setting
$t = \frac{1}{1-\lambda^{*}}$ gives probability $\Omega(1)$ that a marked element will be reached in that time.

The eigenvalues of $P_{0}$ will be $\frac{N-|M|-1}{N-1}$ and
$\frac{-1}{N-|M|-1}$. Then, the classical hitting time will be:

\begin{aligned} t &= \frac{1}{1-\lambda^{*}} \\ &= \frac{1}{1-\frac{N-|M|-1}{N-1}} \\ &= O\left(\frac{N}{|M|}\right).\end{aligned}

It can be showed that for a walk defined by a Markov chain, the
classical hitting time will be $O(\frac{1}{\delta \epsilon})$, where $\delta = 1 - \lambda^{*}$, the spectral gap, and $\epsilon \leq \frac{|M|}{N}$ [2].

Magniez et al proved in [6] that for a reversible, ergodic
Markov chain, the quantum hitting time for a walk on this chain is
within a factor of the square root of the classical hitting time. Since
the walk on this input acts as a walk on a reversible Markov chain until
a marked element is reached, then this is also true for a walk defined
by our transition matrix $P'$. This arises from the fact that the
spectral gap of the matrix describing the quantum walk corresponding to
stochastic matrix $P$ is quadratically larger than the spectral gap of
the matrix describing the classical random walk corresponding to $P$, the proof of which is given in [2]. Thus, the quantum hitting time
is $O(\sqrt{N/|M|})$, which exactly matches the quantum query complexity of Grover search.

## Element distinctness problem

Now, we describe Ambainis’s algorithm given in [1] for solving
the element distinctness problem in $O(N^{\frac{2}{3}})$ time, which
produces a speed up over the classical algorithm, which requires $O(N)$ queries, and also over other known quantum algorithms that do not make use of quantum walks, which require $O(N^{\frac{3}{4}})$ queries. The element distinctness problem is defined as follows: given a function $f$ on a size $N$ set of inputs

$S=\{x_{1}$,…,$x_{N}\}$,

determine whether there exists a pair $x_{1},\; x_{2} \in S$ for which $f(x_{1}) = f(x_{2})$. As in the search problem defined in the previous section, this is a decision problem; we are not concerned with finding the values of these pairs, only whether at least one exists.

The algorithm is similar to the search algorithm described in the previous section, except we define the walk on a Hamming graph. A Hamming graph $H(N,m)$ is defined as follows: each vertex $i$ corresponds to an $m$-tuple, ($i_{1}$,…,$i_{m}$), where $i_{k} \in S$ for all $k$ and repetition is allowed (that is, $i_{k}$ may equal $i_{j}$ for $k \neq j$), and $m$ is a parameter we will choose. Edges will exist between vertices that differ in exactly one coordinate (order matters in this graph). We describe the state of each vertex as:

$\left|i \right>=| i_{1},i_{2}$,…,$i_{m},f(i_{1})$,…,$f(i_{m})>$

Then, moving along each edge that replaces the $j$th coordinate with $x_{k}$ such that $i_{j} \neq x_{k}$ requires two queries to the black box function to erase $f(i_{j})$ and compute $f(x_{k})$. In the case, the marked vertices will be those that contain some $f(i_{k}) = f(i_{j})$ for $i_{j} \neq i_{k}$. Since the function values are stored in the description of the state, then no additional queries to the black box are required to check if in a marked state.

The transition matrix is given by $P = \frac{1}{m(n-1)} \underset{i \in [1,m]}{\sum} (J - I)^{(i)}$. $J$ is the $n \times n$ all one matrix, and the superscript $i$ denotes the operator acting on the $i$th coordinate. The factor of $\frac{1}{m(n-1)}$ normalizes the degree, since the graph is regular. We can compute the spectral gap of this graph to be $\frac{n}{m(n-1)}$ (for details of this computation, see [2]). Then, noting that that the fraction of marked vertices, $\epsilon$, is
$\geq \frac{m(m-1)(n-2)^{m-2}}{n^{m}}$, classically, the query complexity is $m + O(\frac{1}{\delta \epsilon}) = m + O(\frac{n^{2}}{m})$, where $m$ is the queries required to construct the initial state. Setting the parameters equal to minimize with respect to $m$ gives classical query complexity $O(N)$, as expected.

Then in the quantum case, $m$ queries are still required to set up the state. $O(\frac{n}{\sqrt{m}})$ queries are required to perform the walk until a marked state is reached, by [6]. Setting parameters equal gives $O(N^{\frac{2}{3}})$ queries, as desired.

[1] Ambainis, A. Quantum walk algorithm for element distinctness, SIAM Journal on Computing 37(1):210-239 (2007). arXiv:quant-ph/0311001

[2] Childs, A. Lecture Notes on Quantum Algorithms (2017). https://www.cs.umd.edu/ amchilds/qa/qa.pdf

[3] Childs, A., Farhi, E. Gutmann, S. An example of the difference between
quantum and classical random walks. Journal of Quantum Information
Processing, 1:35, 2002. Also quant-ph/0103020.

[4] Godsil, C., Hanmeng, Z. Discrete-Time Quantum Walks and Graph Structures
(2018). arXiv:1701.04474

[5] Kempe, J. Quantum random walks: an introductory overview, Contemporary
Physics, Vol. 44 (4) (2003) 307:327. arXiv:quant-ph/0303081

[6] Magniez, F., Nayak, A., Richter, P.C. et al. On the hitting times of
quantum versus random walks, Algorithmica (2012) 63:91.
https://doi.org/10.1007/s00453-011-9521-6

[7] Szegedy, M. Quantum Speed-up of Markov Chain Based Algorithms, 45th
Annual IEEE Symposium on Foundations of Computer Science (2004).
https://ieeexplore.ieee.org/abstract/document/1366222

[8] Portugal, R. Quantum Walks and Search Algorithms. Springer, New York, NY (2013).