Skip to content

Physics & Computation Blog Post Round-up

March 30, 2019

In the Fall, Boaz and I co-taught a grad seminar on physics and computation (see here for some of the original press coverage). We were lucky to attract an intrepid group of students from multiple fields, with representatives from computer science, physics, math and biology. As part of the course, we asked our students to give presentations and write expository blog posts on a number of topics at the intersection of computation and physics, including algorithms from statistical physics, quantum area laws, and the firewall paradox. The students worked hard to produce posts that make the physics concepts accessible to a computer science audience, and the result is a nice collection of posts that create a “bridge” from CS to physics.

Here, the aim is to give a (long overdue) table of contents for their posts. Followers of the blog may have noticed a landslide of physics & computation posts around December and January; if you weren’t able to keep up with them all at the time, then here they are, rounded up and organized by topic.

Statistical Physics

Background and intro to phase transitions:

Algorithms from statistical physics: belief propagation and approximate message passing:

Informational and computational phase transitions:

Proving the existence of phase transitions:

 Using convex relaxations to approximate partition functions:

Quantum Computation

Background on Quantum Hamiltonians and tensor network representations:

Area laws:

Quantum algorithms:

Quantum games and quantum PCPs:

Quantum supremacy:

Quantum error correction:

Black Holes and the Firewall Paradox

(See also Boaz’s introductory post)

Background on black holes and the Firewall Paradox:

Firewall Paradox meets Computational Complexity:

Nominate TCS papers for research highlights

March 28, 2019

[Guest post by Aleksander Mądry]

To me, one of the best things about working in theoretical computer
science has always the exciting rate of progress we make as a community. 
On (what appears to be) a regular basis, we produce breakthroughs on 
problems that are absolutely fundamental to our field. Problems that 
often look impossible to tackle, right up until someone actually tackles 

However, as inspiring as all these developments were to me, I also 
always felt that we, as a community, could do more to properly recognize 
and highlight them, both internally and to the outside world. This kind 
of outreach would make it easier for us to capitalize on the 
breakthroughs as well as to accelerate the impact of the underlying 
ideas on the other areas of computer science, and beyond.

Fortunately, this is about to change!

One of the first decisions of our newly (re-)elected SIGACT committee was to create a committee (as committees are wont to do 🙂 whose mission will be to help promote top computer science theory research. This SIGACT Research Highlights Committee – consisting of Boaz Barak, Omer Reingold, Mary Wootters and myself – will, in particular, work to identify results to be recommended for consideration for the CACM Research Highlights section as well as other general-audience research outlets in computer science and other fields.

Of course, to do a proper job here we require your help! To this end, 
the committee solicits two types of nominations:

1) Conference nominations. Each year, the committee will ask the PC 
chairs of a broad set of theoretical computer science conferences to 
send a selection of up to three top papers from these conferences 
(selected based on both their technical merit and the potential 
significant interest to non-theory audiences) and forwarding them to the 
committee for consideration.

2) Community nominations. The committee will accept nominations from the members of the community. Each such nomination should summarize the contribution of the nominated (and recently published) paper and also
argue why this paper particularly merits a broader outreach. The 
nomination should be no more than a page in length and can be submitted 
at any time by emailing it to
Self-nominations are discouraged.

To be considered in the upcoming round of our deliberations, we need to 
receive your nomination by April 30.

Looking forward to learning about all the new exciting research that you 
all are doing!

News addicts: Sign up for the CATCS newsletter

March 26, 2019

If, like others following the pace of modern life, you’re the kind of person that needs to get just on time updates on the state of theoretical computer science, consider signing up for the newsletter of CATCS. You can get information about funding opportunities, advocacy efforts, and more. Sure, at the hectic rate of two messages per year, it might flood your inbox, but it is worth it.

Dear Theoretical Computer Scientist,

The Committee for the Advancement of Theoretical Computer Science (CATCS) was established by SIGACT to deal with funding, outreach, and advocacy issues for our community. If you would like to receive our newsletter (no more than twice annually) we encourage you to sign up for the Google group below:!forum/catcs-news
(Navigate to the page above and click “Join group.” You can unsubscribe at any time from the same page.)
Best wishes, CATCS

Submit your failures to CFAIL 2019

March 13, 2019

[Posted at the request of Craig Gentry. I think this is actually a great idea that should be imitated by other sub-areas of TCS as well. –Boaz]

Update: The Conference for Failed Approaches and Insightful Losses in cryptology is celebrating our first failure early: the failure to keep a deadline! We are extending the submission deadline to 11:59 pm EST on April 7 to give everyone more time to work on submissions. In these last weeks leading up to the deadline, you can follow the @cfail2019 twitter account for inspiring jokes about failure to help you reach the finish line!

The CFAIL 2019 program chairs would like to remind you that some failures are good. Failure to attack a cryptosystem? Good! It might be a strong cryptosystem. Failure to publish in Eurocrypt? No problem! You’ll get ’em next time. Failure to cut back on caffeine like you promised in your New Year’s resolution? Good! You must be full of energy!

But failure to submit to CFAIL… lame! 
Work on those submissions, cryptologists! We want to see your failures in all their glory! 
Just a friendly reminder that the CFAIL submission deadline is April 1 April 7th.

Black Holes, a Complexity Theory perspective

February 1, 2019

Guest post by Chi-Ning Chou and Parth Mehta from the physics and computation seminar.


The firewall paradox (introduced here) is a bewitching thought experiment that mandates a deeper understanding of our reality. As luck would have it, QFT predictions seem sound, GR calculations appear valid, and semi-classical approximations look reasonable: no one is willing to budge! To save Alice from burning in the miserable firewall, therefore, we must come up with a radically new proposal. This blog post aims to map what seems to be a hard, physics dilemma into a Computer Science problem that we can, using the grace of a lazy programmer, show to be hard to solve. In particular, we present an overview of the Harlow-Hayden decoding task and show how it maps the Firewall Paradox to a hard computation on a quantum computer. We end by rigorously defining quantum circuit complexity, Aaronson’s improved proof, AdS/CFT correspondence, and some fascinating homework (open) problems.

Why all the fuss?

Have you ever confessed to yourself that you don’t quite understand Black Hole complementarity well? In the past decade or so, physicists realized they did not grasp the concept thoroughly either. The firewall paradox is a natural result of bewildered physicists trying to make sense of reality. Thus far, no satisfying physical explanation reaches people’s consensus. Nevertheless, Daniel Harlow and Patrick Hayden [HH13] proposed a tempting solution to the firewall paradox using Computational Complexity (CC). Concretely, they showed the following.

\text{A conjecture in CC is true}\Rightarrow\text{Firewalls do not exist}.

We elaborate on this deep connection throughout this post.

Problem Solving: Physics v.s. Computer Science

The notion of a `conjecture’ has different implications for either field. In Physics, a wrong conjecture often delights physicists since there is more work left to do and better theory required to explain the physical phenomenon under study. For complexity theorists, however, if, say, the famous \mathbf{P}\neq\mathbf{NP} is proved to be false, a few consequences follow. First, the authors of the proof win a million dollars (See the Millennium problems.). Second, such a result would break almost all the foundations of computational complexity and cryptography. That is, refuting an (important) conjecture in computational complexity is tantamount to resulting in real-world catastrophes! Below in Table 1 is a short summary.

Theoretical Physics Theoretical Computer Science
Object Are the mathematical models for our physical world correct? Is our intuition about the mathematical models we defined correct?
Consequences of disproving After few days/months/years, physicists will come up with a new model and try to falsify it. The belief system of complexity theorists collapses. Some super algorithms might show up and shake the world.
How to prove/disprove Checking mathematical consistency, doing both thought and empirical experiments. Using fancy mathematics or designing super algorithms.

Table 1: “Conjecture”, as used in Physics and Computer Science.

We labour above to convince the reader about these differences because the Harlow-Hayden decoding task has vital implications for both, Physics and Computer Science. The connections between Black Holes and Computational Complexity can be thought of as a new testbench for physical models.

Reckless review: Quantum Information


In Quantum Computation, gates are unitary operators. Some common gates used in the Quantum Information literature are as follows:

  • Single-qubit: Pauli matrices (i.e., X,Y,Z), phase operator P, Hadamard matrix H.
  • Two-qubit: CNOT, Toffoli, CZ.

For more details, please refer to [NC02]. Interestingly enough, singe-qubit and two-qubit gates are sufficient to construct any n-qubit gates! Such a set of operators is said to be universal. For example, \{\text{Toffoli},H,P\} and \{\text{CNOT},G\} are universal for almost every single-qubit operator. Furthermore, Kitaev and Solovay gave a qualitative version of the universality theorem by showing that getting an \epsilon approximation to an n-qubit operator in trace norm, only O(\log^21/\epsilon) gates are needed. A final remark on unitary operators: an n-qubit operator is actually a matrix of size 2^n by 2^n. Namely, it requires 2^{2n}-2^n complex numbers to describe an n-qubit operator. (Note the difference between n and 2^n.)

Quantum circuits

A quantum circuit \mathcal{C} has inputs consisting of n-qubits, potentially with m ancilla bits. The computation is done by interior gates from some universal gate set, e.g., \{\text{Toffoli},H,P\}. The outputs are n qubits with potentially m bits of garbage. See the following example of quantum circuit for the n-qubit Hadamard operator H_n| x\rangle=\sum_{y\in\{0,1\}^n}(-1)^{\langle x,y\rangle}|y\rangle in Figure 1.


Figure 1: A quantum circuit for the n-qubit Hadamard operator.

Similarly, the size of a quantum circuit is defined as the number of interior gates. In Figure 1 for example, the size of the circuit is n.

Quantum circuit complexity: BQP/poly

Let f:\{0,1\}^n\rightarrow\{0,1\} be a boolean function. Define its quantum circuit complexity as the size of the smallest quantum circuit C such that for any x\in\{0,1\}^n


Let \mathbf{BQSIZE}[s(n)] denote the class of boolean functions of quantum circuit complexity at most s(n). The complexity class \mathbf{BQP/poly} is defined as \cup_{c\in\mathbb{N}}\mathbf{BQSIZE}[n^c]. It immediately follows from definition that \mathbf{P/poly}\subseteq\mathbf{BQP/poly}. As proving lower bound for \mathbf{P/poly} (i.e., finding a problem that is not in \mathbf{P/poly}) is a long-standing extremely difficult problem, it is believed to be hard to prove lower bound against \mathbf{BQP/poly}.

Uniform quantum circuit complexity: BQP

As \mathbf{BQP/poly} is too powerful to work with, one might want to define a weaker version of the quantum complexity measure. A natural choice is considering a uniform computational model.

In the classical setting, a uniform computational model is defined using a Turing machine. However, it is not clear how to define the corresponding version, a quantum Turing machine. One way to do so is via uniform circuits, defined as follows. We say a circuit family \mathcal{C}=\{C_n\}_{n\in\mathbb{N}} is \mathbf{P}-uniform if there exists a polynomial time Turing machine such that on input 1^n, it outputs C_n.

Let f:\{0,1\}^n\rightarrow\{0,1\} be a boolean function. Define its uniform quantum circuit complexity as the size of the smallest uniform quantum circuit C such that for any x\in\{0,1\}^n


Let \mathbf{BQTIME}[s(n)] denote the class of boolean functions of quantum circuit complexity at most s(n). The complexity class \mathbf{BQP} is defined as \cup_{c\in\mathbb{N}}\mathbf{BQTIME}[n^c]. It immediately follows from definition that \mathbf{P}\subseteq\mathbf{BPP}\subseteq\mathbf{BQP}.

Unitary complexity: C(U)

Let U\in\mathbb{C}^{2^n\times2^n} be an unitary matrix. Define C(U) be the smallest quantum circuit C such that


This unitary complexity can be thought of as a relaxation of the quantum circuit complexity. The reason is that here a unitary matrix U might not compute a boolean function. Thus, proving a lower bound for \mathbf{BQSIZE} implies a lower bound for unitary complexity while the converse is not clear. Namely, proving a super-polynomial lower bound for the unitary complexity might be an easier task.

However, no non-trivial1 lower bound for the unitary complexity is known and there is, unfortunately, no formal barrier result explaining why this is difficult to prove.

Warm-up: Gottesman-Knill

We defined quantum circuits above, and we hope you find them exotic – at least start-up investors do. But given how fundamental quantum circuits are to the Harlow-Hayden decoding task, we ask: is it possible to efficiently (classically) simulate a quantum circuit made up of a restricted but non-trivial set of quantum gates? We show below a restricted variant of the popular Gottesman-Knill Theorem:

Theorem (Gottesman-Knill).
1. Given: Clifford circuit \mathcal{C}: |\alpha_1\rangle\otimes \cdots \otimes |\alpha_n\rangle \rightarrow \{|0\rangle,|1\rangle\} made up of gates \{CNOT, P, H\}, where \mathcal{C} is measured on its first output line.
2.Task: Show that it is possible to (classically) efficiently sample the output distribution of \mathcal{C}.


\Pr(0) = \langle{\psi_0|\mathcal{C}^{\dag}(|0\rangle}\langle0|)\mathcal{C}|\psi_0\rangle where |\psi_0\rangle = |\alpha_1\rangle\otimes \cdots \otimes |\alpha_n\rangle. Since the projector can be written as |0\rangle\langle0|= \frac{I + Z}{2}, we get

\Pr(0) = \langle\psi_0|\mathcal{C}^{\dag}(\frac{I + Z}{2})\mathcal{C}|\psi_0\rangle =\frac{1}{2}[1 + \langle\psi_0|\mathcal{C}^{\dag}Z_1\mathcal{C}|\psi_0\rangle]

where Z_1 = Z \otimes I\cdots \otimes I since we only measure the first output line of \mathcal{C}. At first glance, \langle\psi_0|\mathcal{C}^{\dag}Z_1\mathcal{C}|\psi_0\rangle might look like a monstrous computation to perform since, in general, the operator in the middle is a 2^n\times 2^n matrix, so the calculating the inner product would require exponential time classically. However, recognizing that Clifford gates are normalizers of the Pauli Group on n qubits, note that \mathcal{C}^{\dag}Z_1\mathcal{C} = P_1 \otimes \cdots \otimes P_n where P_i is some Pauli matrix. It is straightforward to show that these update rules can be computed efficiently. We thus have

\langle\psi_0|\mathcal{C}^{\dag}Z_1\mathcal{C}|\psi_0\rangle = \langle\psi_0|P_1 \otimes \cdots \otimes P_n|\psi_0\rangle = \prod_{i = 1}^{n} \langle\alpha_i|P_i|\alpha_i\rangle

which is a product of n terms. We have thus reduced the (exponentially large) burden of computing a giant 2^n\times 2^n matrix
to computing n matrices size 2\times 2, so we can sample the output distribution efficiently.

The Firewall paradox and the Harlow-Hayden decoding task

Physics to CS


Figure 2: A cartoon representing drama (no pun intended) near the Black Hole.

All of the black hole physics covered in the previous blog post leads to the moment (we hope) you have been waiting for: a charming resolution of the firewall paradox. Consider the interior of an old, rusty black hole H that has radiated away more than half of its matter. Let R be the old Hawking radiation, and let B represent the fresh Hawking radiation coming right out of the boundary of the Black Hole. Alice is our canonical Ph.D. student who is brave enough to risk her life for physics. Since H is a giant information scrambler, we expect to find entanglement between R and B with overwhelming probability. We know from QFT that there are bell pairs straddling the event horizon of the black hole, so B and H should be maximally entangled. But this is a problem because B cannot be entangled with both R and H! The AMPS argument shows that if Alice is able to distill a bell pair between R and B, then we should see a firewall of photons at the event horizon, thus violating the no-drama postulate. See Figure 2 for more intuition about the set up. (Note that the \cup‘s represent Bell Pairs, as consistent with the 3D-Quon Language) If we take Black Hole complimentary seriously, then we have an answer! If Alice does not distill a Bell pair between R and B, then nothing really happens. However, if Alice does manage to distill the entanglement between R and B , then we witness a firewall. Is not this answer so very unsatisfactory? Why should the existence of a firewall depend on Alice’s ability to distill entanglement? What is so special about this decoding task?
The H-H decoding task answers precisely this question. Intuitively, it says that if Alice manages to distill a Bell pair between R and H, she could also invert a one-way function, a task we believe is very hard to perform! We conjecture that Alice would take exponential time to decode the entanglement, so the Black Hole would disappear long before Alice even makes a dent in the problem! Before we provide an in depth resolution of the paradox through the H-H decoding, let us (as good philosophers do) briefly review assumptions:

  1. The Black Hole H can be modelled by a finite collection of qubits, say n qubits.
  2. Alice is told that the initial state of H is the product basis |0\rangle^{\otimes n}.
  3. Black Hole dynamics are assumed to be unitary, so Alice need not worry about some spooky M-theory that may claim to evolve H in a non-unitary fashion.
  4. H is a giant information scrambler, represented by some random circuit \mathcal{C}.
  5. Fresh radiation B is a single qubit, w.l.o.g., since any additional qubits could be made a part of R.
  6. |\psi\rangle_{RBH} is not Haar-Random. Mini-exercise: prove that if |\psi\rangle_{RBH} is Haar-Random, our job becomes easy because the circuit complexity of the H-H decoding task grows exponentially with n, the size of H.
  7. Alice has access only to circuit \mathcal{C} and R, B but not H. Trivial Mini-exercise: prove that if Alice has access to R,B,H, \mathcal{C}, then it is easy to distill the entanglement between R and B.
  8. Alice may be an intellectual Goddess who just knows which unitary to apply, or, more realistically, someone who has exponential time to prepare before the Black Hole forms. Of crucial importance therefore is the circuit complexity of the unitary Alice applies to distill the Bell pair, not so much the process of finding the unitary.

Distilling the B-R Bell pair

Let us jump into the definition of the Harlow-Hayden decoding task.

Definition (Harlow-Hayden decoding task).
Given a (polynomial-size) quantum circuit C as input such that |\psi\rangle_{RBH}=C|0^n\rangle where R,B,H are three disjoint part of the n qubits. Furthermore, it is guaranteed that there exists a unitary operator U acting only on the qubits in R such that after applying U, the rightmost bit of R and the leftmost bit of B forms a bell pair \frac{|00\rangle+|11\rangle}{\sqrt{2}}. The goal of the Harlow-Hayden decoding task is then to find a quantum circuit for such U on the qubits in R. See Figure 3.


Figure 3: The Harlow-Hayden decoding task.

A necessary condition for the firewall paradox to make sense is that the Harlow-Hayden decoding task should be easy. If Alice cannot distill the entanglement efficiently, the black hole will evaporate before Alice is ready to witness the firewall!

To refute the firewall paradox, Harlow and Hayden proved the following theorem.

Theorem 1.
If the Harlow-Hayden decoding task can be done in \mathbf{BQP/poly}, then \mathbf{SZK}\subseteq\mathbf{BQP/poly}.

We won’t formally define the complexity class \mathbf{SZK}. However, it is important to know that the foundation of the lattice-based cryptography, a promising quantum-secure crypto framework, is based on the hardness of some problem in \mathbf{SZK}. If \mathbf{SZK}\subseteq\mathbf{BQP/poly}, then all lattice-based cryptosystems can be broken by polynomial time quantum algorithm!

Instead of a proof for Theorem 1, which is more involved, we give a proof for an improvement of the Harlow-Hayden theorem due to Scott Aaronson. (Aaronson also showed that there might not even exist quantum-secure cryptography if the Harlow-Hayden decoding task can be efficiently solved!)

Aaronson’s improvement

In Aaronson’s lecture notes [Aar16], he showed the following improvement on Theorem 1.

Theorem 2.
If the Harlow-Hayden decoding task can be done in \mathbf{BQP/poly}, then quantum-secure injective one-way function does not exist.

Before formally defining a one way function, it is paramount to understand its impact: modern cryptosystems are built from some variant of a one-way function. Intuitively, primitives that have the one-way property are (i) easy to implement (e.g., encrypt) but (ii) hard to invert (e.g., be attacked). As a result, if there is no quantum-secure injective one-way function, then that is strong evidence that quantum-secure cryptography might not exist.

Now, let us formally define what quantum-secure injective one-way function is and give a formal proof for Theorem 2.

Definition 1 (Quantum-secure injective one-way function).
A boolean function f:\{0,1\}^n\rightarrow\{0,1\}^m is a quantum-secure injective one-way function if

  • f is injective,
  • f\in\mathbf{BQP/poly}, and
  • for any polynomial time quantum algorithm A

Note that since f is injective, the last condition can actually be phrased as x=A(f(x)). Also, the condition should be read as “on input f(x), the quantum algorithm A outputs x”, namely, A inverts f.


Suppose the Harlow-Hayden decoding task is in \mathbf{BQP/poly}, we are going to show that for any injective f:\{0,1\}^n\rightarrow\{0,1\}^m computable by some polynomial size quantum circuit, there is a polynomial time quantum algorithm that inverts f. Namely, f is not a quantum-secure injective one-way function.To get an efficient inverting algorithm for f, let us first prepare a special circuit C from f and treat it as an input to the Harlow-Hayden decoding task. The circuit C will simply map the |0^{m+2+n}\rangle to the following state


Note that as f has a polynomial size quantum circuit, the circuit \mathcal{C} can also be implemented in polynomial size.Next, the easiness of the Harlow-Hayden decoding task guarantees us the existence of a unitary operation U on the qubits in R such that for any x\in\{0,1\}^n

U\left(\frac{|x,0^{m-n},0\rangle_R+|f(x),1\rangle_R}{\sqrt{2}}\right) = |\phi_x\rangle_R\left(\frac{|0\rangle+|1\rangle}{\sqrt{2}}\right)

for some state |\phi_x\rangle_R. By restricting U on the first m qubits, one can get unitary operators V and W such that for all x\in\{0,1\}^n,

V|x,0^{m-n}\rangle=|\phi_x\rangle\text{ and }W|f(x)\rangle=|\phi_x\rangle.

Thus, V^\dagger W inverts f because for any x\in\{0,1\}^n,

V^\dagger W|f(x)\rangle=|x,0^{m-n}\rangle.

Furthermore, as we are guaranteed that the Harlow-Hayden decoding task is in \mathbf{BQP/poly}, U as well as V and W all have polynomial size quantum circuits! Namely, f can be efficiently inverted by a quantum algorithm and thus f is not a quantum-secure injective one-way function.

What’s next?

The Harlow-Hayden decoding task as well as the Aaronson’s improvement can be interpreted as (strong) evidence that distilling the B-R Bell pair is hard (in the worst-case2). One might hope for an average-case hardness for the Harlow-Hayden decoding task and thus infer that most black holes are difficult to distill. However, even if such average-case hardness results existed, physicists would still remain dissatisfied! The foremost grievance a physicist may have is the lack of a coherent causal framework to model reality. That is, what happens if, in the
very small but non-zero chance, a black hole is easy to distill? Does that mean that a firewall exists in such black hole? How can a unifying theory explain such situation coherently? An ideal theory for theoretical physicists should work for every black hole instead of for most black holes! Second, physicists seem to dislike the abstract, process-theoretic approach undertaken by computer scientists. Here, we have completely ignored talking about the internal dynamics of a black hole or even a full description of its evolving Hilbert space. They would, for instance, like to see a differential equation that captures the difficulty of distilling a black hole throughout its evolution. Resolutions to the firewall paradox or effort towards building a theory of quantum gravity should be somewhat explicit in the sense that one can really instantiate some (toy) examples from the theory and see how the system evolves and examine whether this fits the real experience from the world. In other words, a theory with a black box (i.e., a complexity conjecture) might not be regarded as a resolution.


  1. What powers would Alice need to ensure that she can efficiently distill the B-R bell pair. What if we assume \mathbf{P = PSPACE}?
  2. Can we show that the decoding is hard on average, rather than for the worst case?
  3. What are some similar deep connections between black holes and complexity theory?
  4. For people interested in the quantum complexity theory, there are many open problems regarding the quantum circuit complexity: consider the unitary synthesis problem3 proposed by Scott Aaronson [Aar16].
  5. Another interesting problem is connecting the difficulty of proving quantum circuit lower bounds to other complexity problem such as classical circuit lower bounds or cryptographic assumptions.


1Non-trivial here means the unitary matrix U is explicit in the sense that given i,j\in[2^n]], one can efficiently compute U_{ij}.
2Hard in worst-case means that there does not exist efficient algorithm that works on every input. Another hardness notion is hard on average, by which we mean there does not exist efficient algorithm the works for most of the input. Showing average-case hardness is in general a more difficult task than proving worst-case hardness.
3Does the following hold: for any unitary matrix U\in\mathbb{C}^{2^n\times2^n}, there exists a classical oracle A such that C^A(U)=n^{O(1)} where C^A(U) is the minimum size of quantum circuit that approximates U with oracle access to A.


[Aar16] Scott Aaronson. The complexity of quantum states and transformations: from quantum money to black holes. arXiv preprint arXiv:1607.05256, 2016.
[HH13] Daniel Harlow and Patrick Hayden. Quantum computation vs. firewalls.
Journal of High Energy Physics, 2013(6):85, 2013.
[NC02] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information, 2002.

Black hole paradoxes: A conservative yet radical journey

January 30, 2019

Guest post by Abhishek Anand and Noah Miller from the physics and computation seminar.

In 2013, Harlow and Hayden drew an unexpected connection between theoretical computer science and theoretical physics as they proposed a potential resolution to the famous black hole Firewall paradox using computational complexity arguments. This blog post attempts to lay out the Firewall paradox and other peculiar (at first) properties associated with black holes that make them such intriguing objects to study. This post is inspired by Scott Aaronson’s [1] and Daniel Harlow’s [2] excellent notes on the same topic. The notes accompanying this post provides a thorough and self-contained introduction to theoretical physics from a CS perspective. Furthermore, for a quick and intuitive summary of the Firewall paradox and it’s link to computational complexity, refer to this blog post by Professor Barak last summer.

Black holes and conservative radicalism

Black holes are fascinating objects. Very briefly, they are regions of spacetime where the matter-energy density is so high and hence, where the gravitational effects are so strong that no particle (not even light!) can escape from it. More specifically, we define a particular distance called the “Schwarzschild radius” and anything that enters within the Schwarzschild radius, (also known as the “event horizon,”) cannot ever escape from the black hole. General relativity predicts that this particle is bound to hit the “singularity,” where spacetime curvature becomes infinite. In the truest sense of the word, they represent the “edge cases” of our Universe. Hence, perhaps, it is fitting that physicists believe that through thought experiments at these edges cases, they can investigate the true behavior of the laws that govern our Universe.

Once you know that such an object exists, many questions arise: what would it look it from the outside? Could we already be within the event horizon of a future black hole? How much information does it store? Would something special be happening at the Schwarzschild radius? How would the singularity manifest physically?

The journey of trying to answer these questions can aptly be described by the term “radical conservatism.” This is a phrase that has become quite popular in the physics community. A “radical conservative” would be someone that tries to modify as few laws of physics as possible (that’s the conservative part) and through their dogmatic refusal to modify these laws and go wherever their reasoning leads (that’s the radical part) is able to derive amazing things. We radically use the given system of beliefs to lead to certain conclusions (sometimes paradoxes!) and then conservatively update the system of beliefs to resolve the created paradox and iterate. We shall go through a few such cycles and end at the Firewall paradox. Let’s begin with the first problem: how much information does a black hole store?

Entropy of a black hole

A black hole is a physical object. Hence, it could be able to store some information. But how much? In other words, what should the entropy of a black hole be? There are two simple ways of looking at this problem:

  • 0: The no-hair theorem postulates that an outside observer can measure a small number of quantities which completely characterize the black hole. There’s the mass of the black hole, which is its most important quantity. Interestingly, if the star was spinning before it collapsed, the black hole will also have some angular momentum, and its equator will bulge out a bit. Hence, the black hole is also characterized by an angular momentum vector. Also, if the object had some net charge, the black hole would also have that net charge. This means that if two black holes were created due to a book and a pizza, respectively, with the same mass, charge and angular momentum, there would settle down to the “same” black hole with no observable difference. If an outside observer knows these quantities, they will now know everything about the black hole. So, in this view, we should expect for the entropy of a black hole to be 0.
  • Unbounded: But maybe that’s not entirely fair. After all, the contents of the star should somehow be contained in the singularity, hidden behind the horizon. As we saw above, all of the specific details of the star from before the collapse do not have any effect on the properties of the resulting black hole. The only stuff that matters it the total mass, total angular momentum, and the total charge. That leaves an infinite number of possible objects that could all have produced the same black hole: a pizza or a book or a PlayStation and so on. So actually, perhaps, we should expect the entropy of a black hole to be unbounded.

The first answer troubled Jacob Bekenstein. He was a firm believer in the Second Law of Thermodynamics: the total entropy of an isolated system can never decrease over time. However, if the entropy of a black hole is 0, it provides with a way to reduce the entropy of any system: just dump objects with non-zero entropy into the black hole.

Bekenstein drew connections between the area of the black hole and its entropy. For example, the way in which a black hole’s area could only increase (according to classical general relativity) seemed reminiscent of entropy. Moreover, when two black holes merge, the area of the final black hole will always exceed the sum of the areas of the two original black holes This is surprising as for two spheres, the area/radius of the merged sphere, is always less than the sum of the areas/radii of two individual spheres:

(r_1^3 + r_2^3)^{\frac{1}{3}} < r_1 + r_2

Most things we’re used to, like a box of gas, have an entropy that scales linearly with its volume. However, black holes are not like most things. He predicted that entropy of a black hole should be proportional to its area, A and not its volume. We now believe that Bekenstein was right and it turns out that the entropy of the black hole can be written as:


where k is Boltzmann constant and l_p is the Planck-length, a length scale where physicists believe quantum mechanics breaks down and a quantum theory of gravity will be required. Interestingly, it seems as though the entropy of the black hole is (one-fourth times) the number of Planck-length-sized squares it would take to tile the horizon area. (Perhaps, the microstates of the black hole are “stored” on the horizon?) Using “natural units” where we set all constants to 1, we can write this as


which is very pretty. Even though this number of not infinite, it is very large. Here are some numerical estimates from [2]. The entropy of the universe (minus all the black holes) mostly comes from cosmic microwave background radiation and is about 10^{87} in some units. Meanwhile, in the same units, the entropy of a solar mass black hole is 10^{78}. The entropy of our sun, as it is now, is a much smaller 10^{60}. The entropy of the supermassive black hole in the center of our galaxy is 10^{88}, larger than the rest of the universe combined (minus black holes). The entropy of any of the largest known supermassive black holes would be 10^{96}. Hence, there is a simple “argument” which suggests that black holes are the most efficient information storage devices in the universe: if you wanted to store a lot of information in a region smaller than a black hole horizon, it would probably have to be so dense that it would just be a black hole anyway.

However, this resolution to “maintain” the second law of thermodynamics leads to a radical conclusion: if a black hole has non-zero entropy, it must have a non-zero temperature and hence, must emit thermal radiation. This troubled Hawking.

Hawking radiation and black hole evaporation

Hawking did a semi-classical computation looking at energy fluctuations near the horizon and actually found that black holes do radiate! They emit energy in the form of very low-energy particles. This is a unique feature of what happens to black holes when you take quantum field theory into account and is very surprising. However, the Hawking radiation from any actually existing black hole is far too weak to have been detected experimentally.

One simplified way to understand the Hawking radiation is by thinking about highly coupled modes (think “particles”) being formed continuously near the horizon. As this formation must conserve the conservation of energy, one of these particles has negative energy and one of the particles has the same energy but with a positive sign and hence, they are maximally entangled (if you know the energy of one of the particles, you know the energy of the other one): we will be referring to this as short-range entanglement. The one with negative energy falls into the black hole while the one with positive energy comes out as Hawking radiation. The maximally-entangled state of the modes looks like:

\sum_{\mathbf{k}} f(\mathbf{k}) |\mathbf{k}\rangle_{\rm in} |\mathbf{k}\rangle_{\rm out}

Here is a cartoon that represents the process:

Because energetic particles are leaving the black hole and negative energy particles are adding to it, the black hole itself will actually shrink, which would never happen classically! And, eventually a black-hole will disappear. In fact, the time of evaporation of the black hole scales polynomially in the radius of the black hole, as R^3. The black holes that we know about are simply too big and would be shrinking too slowly. A stellar-mass black hole would take 10^{67} years to disappear from Hawking radiation.

However, the fact that black holes disappear does not play nicely with another core belief in physics: reversibility.

Unitary evolution and thermal radiation

A core tenet of quantum mechanics is unitary evolution: every operation that happens to a quantum state must be reversible (invertible). That is: if we know the final state and the set and order of operations performed, we should be able to invert the operations and get back the initial state. No information is lost. However, something weird happens with an evaporating black hole. First, let us quickly review pure and mixed quantum states. A pure state is a quantum state that can be described by a single ket vector while a mixed state represents a classical (probabilistic) mixture of pure states and can be expressed using density matrices. For example, in both, the pure state |\psi\langle = |1\rangle + |0\rangle and mixed state \rho = |1\rangle\langle1| +  |0\rangle\langle0| would one measure 1 half the time and 0 50% half the time. However, in the later one would not observe any quantum effects (think interference patterns of the double-slit experiment).

People outside of the black hole will not be able to measure the objects (quantum degrees of freedom) that are inside the black hole. They will only be able to perform measurements on a subset of the information: the one available outside of the event horizon. So, the state they would measure would be a mixed state. A simple example to explain what this means is that if the state of the particles near the horizon is:

|\Psi\rangle_{init} = \frac{|0\rangle_A|0\rangle_B +|1\rangle_A|1\rangle_B} {\sqrt{2}}

tracing over the qubit A leaves us with the state and density matrix:

|\Psi\rangle_{obs} = \frac{|0\rangle_{B}\langle0| + |1\rangle_{B}\langle1|}{2},

\rho_{obs} = \frac{1}{2} \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}

which is a classical mixed state (50% of times results in 1 and 50% of times results in 0). The non-diagonal entries of the density matrix encode the “quantum inference” of the quantum state. Here, are they are, in some sense we have lost the “quantum” aspect of the information.

In fact, Hawking went and traced over the field degrees of freedom that were hidden behind the event horizon, and found something surprising: the mixed state was thermal! It acted “as if” it is being emitted by some object with temperature “T” which does not depend on what formed the black hole and solely depends on the mass of the black hole. Now, we have the information paradox:

  • Physics perspective: Now, once the black hole evaporates, we are left with this mixed thermal there is no way to precisely reconstruct the initial state that formed the black hole: the black hole has taken away information! Once the black hole is gone, the information of what went into the black hole is gone for good. Nobody living in the post-black-hole universe could figure out exactly what went into the black hole, even if they had full knowledge of the radiation. Another way to derive a contradiction is that the process of black hole evaporation when combined with the disappearance of the black hole, imply that a pure state has evolved into a mixed state, something which is impossible via unitary time evolution! Pure states only become mixed states whenever we decide to perform a partial trace; they never become mixed because of Schrodinger’s equation which governs the evolution of quantum states.
  • CS perspective: We live in a world where only invertible functions are allowed. However, we are given this exotic function – the black hole – which seems to be a genuine random one-to-many function. There is no way to determine the input deterministically given the output of the function.

What gives? If the process of black hole evaporation is truly “non-unitary,” it would be a first for physics. We have no way to make sense of quantum mechanics without the assumption of unitary operations and reversibility; hence, it does not seem very conservative to get ride of it.

Physicists don’t know exactly how information is conserved, but they think that if they assume that it does, it will help them figure out something about quantum gravity. Most physicists believe that the process of black hole evaporation should indeed be unitary. The information of what went into the black hole is being released via the radiation in way too subtle for us to currently understand. What does this mean?

  • Physics perspective: Somehow, after the black hole is gone, the final state we observe, after tracing over the degrees of freedom taken away by the black hole, is pure and encodes all information about what went inside the black hole. That is: Tr_{inside} (|\Psi\rangle_{init}\langle\Psi|) = pure
  • CS perspective: Somehow, the exotic black hole function seems random but actually is pseudo-random as well as injective and given the output and enough time, we can decode it and determine the input (think hash functions!).

However, this causes yet another unwanted consequence: the violation of the no-cloning theorem!

Xeroxing problem and black hole complementarity

The no-cloning theorem simply states that an arbitrary quantum state cannot be copied. In other words, if you have one qubit representing some initial state, no matter what operations you do, you cannot end up with two qubits with the same state you started with. How do our assumptions violate this?

Say you are outside the black hole and send in a qubit with some information (input to the function). You collect the radiation corresponding to the qubit (output of the function) that came out. Now you decode this radiation (output) to determine the state of infalling matter (input). Aha! You have violated the no-cloning theorem as you have two copies of the same state: one inside and one outside the black hole.

So wait, again, what gives?

One possible resolution is to postulate that the inside of the black hole just does not exist. However, that doesn’t seem very conservative. According to Einstein’s theory of relativity, locally speaking, there is nothing particularly special about the horizon: hence, one should be able to cross the horizon and move towards the singularity peacefully.

The crucial observation is that for the person who jumped into the black hole, the outside universe may as well not exist; they can not escape. Extending this further, perhaps, somebody on the outside does not believe the interior of the black hole exists and somebody on the inside does not believe the exterior exists and they are both right. This hypothesis, formulated in the early 1990s, has been given the name of Black Hole Complementarity. The word “complementarity” comes from the fact that two observers give different yet complementary views of the world.

In this view, according to someone on the outside, instead of entering the black hole at some finite time, the infalling observer will instead be stopped at some region very close to the horizon, which is quite hot when you get up close. Then, the Hawking radiation coming off of the horizon will hit the observer on its way out, carrying the information about them which has been plastered on the horizon. So the outside observer, who is free to collect this radiation, should be able to reconstruct all the information about the person who went in. Of course, that person will have burned up near the horizon and will be dead.

And from the infalling observer’s perspective, however, they were able to pass peacefully through the black hole and sail on to the singularity. So from their perspective, they live, while from the outside it looks like they died. However, no contradiction can be reached, because nobody has access to both realities.

But why is that? Couldn’t the outside observer see the infalling observer die and then rocket themselves straight into the black hole themselves to meet the alive person once again before they hit the singularity, thus producing a contradiction?

The core idea is that it must take some time for the infalling observer to “thermalize” (equilibriate) on the horizon: enough time for the infalling observer to reach the singularity and hence become completely inaccessible. Calculations do show this to be true. In fact, we can already sense a taste of complexity theory even in this argument: we are assuming that some process is slower than some other process.

In summary, according to the BHC worldview, the information outside the horizon is redundant with the information inside the horizon.

But, in 2012, a new paradox, the Firewall paradox, was introduced by AMPS [3]. This paradox seems to be immune to BHC: the paradox exists even if we assume everything we have discussed till now. The physics principle we violate, in this case, is the monogamy of entanglement.

Monogamy of entanglement and Page time

Before we state the Firewall paradox, we must introduce two key concepts.

Monogamy of entanglement

Monogamy of entanglement is a statement about the maximum entanglement a particle can share with other particles. More precisely, if two particles A and B are maximally entangled with each other, they cannot be at all entanglement with a third particle C. Two maximally entangled particles have saturated both of their “entanglement quotas\”. In order for them to have correlations with other particles, they must decrease their entanglement with each other.

Monogamy of entanglement can be understood as a static version of the no-cloning theorem. Here is a short proof sketch of why polygamy of entanglement implies the violation of no-cloning theorem.

Let’s take a short detour to explain quantum teleportation:

Say you have three particles A, B, and C with A and B maximally entangled (Bell pair), and C is an arbitrary quantum state:

|\Phi^+\rangle_{AB} = \frac{1}{\sqrt{2}} (|0\rangle_A \otimes |0\rangle_{B} + |1\rangle_A \otimes |1\rangle_{B})

|\psi\rangle_C = \alpha |0\rangle_C + \beta|1\rangle_C

We can write their total state as:

|\psi \rangle_{C}\otimes |\Phi ^{+}\rangle_{AB}=(\alpha |0\rangle_{C}+\beta |1\rangle_{C})\otimes {\frac {1}{\sqrt {2}}}(|0\rangle_{A}\otimes |0\rangle_{B}+|1\rangle_{A}\otimes |1\rangle_{B})

Re-arranging and pairing A and C, the state simplifies to:

|\psi \rangle {C}\otimes \ |\Phi ^{+}\rangle {AB}\ = \frac {1}{2}{\Big \lbrack }\ |\Phi ^{+}\rangle {AC}\otimes (\alpha |0\rangle {B}+\beta |1\rangle {B})\ +\ |\Phi ^{-}\rangle {AC}\otimes (\alpha |0\rangle {B}-\beta |1\rangle {B})\ +\ |\Psi ^{+}\rangle {AC}\otimes (\beta |0\rangle {B}+\alpha |1\rangle {B})\ +\ |\Psi ^{-}\rangle {AC}\otimes (\beta |0\rangle {B}-\alpha |1\rangle {B}){\Big \rbrack }

which means that if one does a Bell pair measurement on A and C, based on the measurement outcome, we know exactly which state B is projected to and by using rotations can make the state of B equal to the initial state of C. Hence, we teleported quantum information from C to B.

Now, assume that A was maximally entangled to both B and D. Then by doing the same procedure, we could teleport quantum information from C to both B and D and hence, violate the no-cloning theorem!

Page time

Named after Don Page, the “Page time” refers to the time when the black hole has emitted enough of its energy in the form of Hawking radiation that its entropy has (approximately) halved. Now the question is, what’s so special about the Page time?

First note that the rank of the density matrix is closely related to its purity (or mixedness). For example, a completely mixed state is the diagonal matrix:

\rho_{obs} = \frac{1}{4} \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}

which has maximal rank (2^{2}). Furthermore, a completely pure state |\Psi\rangle \langle\Psi| can always be represented as (if we just change the basis and make the first column/row represent |\Psi\rangle):

\rho_{obs} =  \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0  \end{pmatrix}

which has rank 1.

Imagine we have watched a black hole form and begin emitting Hawking radiation. Say we start collecting this radiation. The density matrix of the radiation will have the form:

\rho_{obs} = \sum_{i=1}^{2^{n-k}} p_i |\Psi\rangle_{i}\langle\Psi|

where n is the total number of qubits in our initial state, k is the number of qubits outside (in form of radiation), and p_i is the probability of each state. We are simply tracing over the degrees of freedom inside the black hole (as there are n-k degrees inside the black hole, dimensionality of this space is 2^{n-k}).

Don Page proposed the following graph of what he thought entanglement entropy of this density matrix should look like. It is fittingly called the “Page curve.”

The Page Curve
  • If k<\frac{n}{2}, rank(\rho) = 2^{k}, as there are enough terms in the sum to get a maximally ranked matrix. And hence, we get maximally mixed states. In the beginning, the radiation we collect at early times will still remain heavily entangled with the degrees of freedom near the black hole, and as such the state will look mixed to us because we can not yet observe all the complicated entanglement. As more and more information leaves the black hole in the form of Hawking radiation, we are “tracing out” fewer and fewer of the near-horizon degrees of freedom. The dimension of our density matrix grows bigger and bigger, and because the outgoing radiation is still so entangled with the near-horizon degrees of freedom, the density matrix will still have off-diagonal terms which are essentially zero. Hence, the state entropy increases linearly.
  • But if k>\frac{n}{2}, by the same argument, rank(\rho) = 2^{n-k} < 2^{k}. Hence, the density matrix becomes more and more pure. Once the black hole’s entropy has reduced by half, the dimension of the Hilbert space we are tracing out finally becomes smaller than the dimension of the Hilbert space we are not tracing out. The off-diagonal terms spring into our density matrix, growing in size and number as the black hole continues to shrink. Finally, once the black hole is gone, we can easily see that all the resulting radiation is in a pure state.

The entanglement entropy of the outgoing radiation finally starts decreasing, as we are finally able to start seeing entanglements between all this seemingly random radiation we have painstakingly collected. Some people like to say that if one could calculate the Page curve from first principles, the information paradox would be solved. Now we are ready to state the firewall paradox.

The Firewall Paradox

Say Alice collects all the Hawking radiation coming out of a black hole. At maybe, about 1.5 times the Page time, Alice is now able to see significant entanglement in all the radiation she has collected. Alice then dives into the black hole and sees an outgoing Hawking mode escaping. Given the Page curve, we know that knowing this outgoing mode must decrease the entropy of our observed mixed state. In other words, it must make our observed density matrix purer. And hence, be entangled with the particles we have already collected.

(Another way to think about this: let’s say that a random quantum circuit at the horizon scrambles the information in a non-trivial yet injective way in order for radiation particles to encode the information regarding what went inside the black hole. The output qubits of the circuit must be highly entangled due to the random circuit.)

However, given our discussion on Hawking radiation about short-range entanglement, the outgoing mode must be maximally entangled with an in-falling partner mode. This contradicts monogamy of entanglement! The outgoing mode cannot be entangled both with the radiation Alice has already collected and also maximally entangled with the nearby infalling mode!

So, to summarize, what did we do? We started with the existence of black holes and through our game of conservative radicalism, modified how physics works around them in order to make sure the following dear Physics principles are not violated by these special objects:

  • Second Law of Thermodynamics
  • Objects with entropy emit thermal radiation
  • Unitary evolution and reversibility
  • No-cloning theorem
  • Monogamy of entanglement

And finally, ended with the Firewall paradox.

So, for the last time in this blog post, what gives?

  • Firewall solution: The first solution to the paradox is the existence of a firewall at the horizon. The only way to not have the short-range entanglement discussed is if there is very high energy density at the horizon. However, this violates the “no-drama” theorem and Einstein’s equivalence principle of general relativity which states that locally there should be nothing special about the horizon. If firewalls did exist, an actual wall of fire could randomly appear out of nowhere in front of us right now if a future black hole would have its horizon near us. Hence, this solution is not very popular.
  • Remnant solution: One possible resolution would be that the black hole never “poofs” but some quantum gravity effect we do not yet understand stabilizes it instead, allowing for some Planck-sized object to stick around? Such an object would be called a “remnant.” The so-called “remnant solution” to the information paradox is not a very popular one. People don’t like the idea of a very tiny, low-mass object holding an absurdly large amount of information.
  • No unitary evolution: Perhaps, black holes are special objects which actually lose information! This would mean that black hole physics (the quantum theory of gravity) would be considerably different compared to quantum field theory.
  • Computational complexity solution?: Can anyone ever observe this violation? And if not, does that resolve the paradox? This will be covered in our next blog post by Parth and Chi-Ning.


  1. Scott Aaronson. The complexity of quantum states and transformations: from quantum money to black holes.arXiv preprintarXiv:1607.05256, 2016.
  2. Daniel Harlow. Jerusalem lectures on black holes and quantum information. Reviews of Modern Physics, 88(1):015002, 2016.
  3. Ahmed Almheiri, Donald Marolf, Joseph Polchinski, and JamesSully. Black holes: complementarity or firewalls? Journal of HighEnergy Physics, 2013(2):62, 2013.

Introduction to AMP and the Replica Trick

January 26, 2019

(This post from the lecture by Yueqi Sheng)

In this post, we will talk about detecting phase transitions using
Approximate-Message-Passing (AMP), which is an extension of
Belief-Propagation to “dense” models. We will also discuss the Replica
Symmetric trick, which is a heuristic method of analyzing phase
transitions. We focus on the Rademacher spiked Wigner model (defined
below), and show how both these methods yield the same phrase transition
in this setting.

The Rademacher spiked Wigner model (RSW) is the following. We are given
observations Y = \frac{\lambda}{n}xx^T + \frac{1}{\sqrt{n}}W where
x \in \{\pm 1\}^n (sampled uniformly) is the true signal and W is a
Gaussian-Orthogonal-Ensemble (GOE) matrix:
W_{i, j} \sim \mathbb{N}(0, 1) for i \neq j and
W_{i, i} \sim \mathbb{N}(0, 2). Here \lambda is the signal to noise
ratio. The goal is to approximately recover x.

The question here is: how small can \lambda be such that it is
impossible to recover anything reasonably correlated with the
ground-truth x? And what do the approximate-message-passing algorithm
(or the replica method) have to say about this?

To answer the first question, one can think of the task here is to
distinguish Y \sim \frac{\lambda}{n}xx^T + \frac{1}{\sqrt{n}}W vs
Y \sim W. One approach to distinguishing these distributions is to
look at the spectrum of the observation matrix Y. (In fact, it turns
out that this is an asymptotically optimal distinguisher [1]). The spectrum of Y behaves as ([2]):

  • When \lambda \leq 1, the empirical distribution of eigenvalues in
    spiked model still follows the semicircle law, with the top
    eigenvalues \approx 2

  • When \lambda > 1, we start to see an eigenvalue > 2 in the
    planted model.

Approximate message passing

This section approximately follows the exposition in [3].

First, note that in the Rademacher spiked Wigner model, the posterior
distribution of the signal \sigma conditioned on the observation Y
is: \Pr[\sigma | Y] \propto \Pr[Y | \sigma] \propto \prod_{i \neq j} \exp(\lambda Y_{i, j} \sigma_i \sigma_j /2 ) This
defines a graphical-model (or “factor-graph”), over which we can perform
Belief-Propogation to infer the posterior distribution of \sigma.
However, in this case the factor-graph is dense (the distribution is a
product of potentials \exp(\lambda Y_{i, j} \sigma_i\sigma_j) for all
pairs of i, j).

In the previous blog post, we saw belief propagation works great when the underlying interaction
graph is sparse. Intuitively, this is because G is locally tree like,
which allows us to assume each messages are independent random
variables. In dense model, this no longer holds. One can think of dense
model as each node receive a weak signal from all its neighbors.

In the dense model setting, a class of algorithms called Approximate
message passing (AMP) is proposed as an alternative of BP. We will
define AMP for RWM in terms of its state evolution.

State evolution of AMP for Rademacher spiked Wigner model

Recall that in BP, we wish to infer the posterior distributon of
\sigma, and the messages we pass between nodes correspond to marginal
probability distribution over values on nodes. In our setting, since the
distributions are over \{\pm 1\}, we can represent distributions by
their expected values. Let m^t_{u \to v} \in [-1, 1] denote the
message from u to v at time t. That is, m_{u \to v} corresponds
to the expected value {{\mathbb{E}}}[\sigma_u].

To derive the BP update rules, we want to compute the expectation
{{\mathbb{E}}}[\sigma_v] of a node v, given the
messages {{\mathbb{E}}}[\sigma_u] for u \neq v. We can
do this using the posterior distribution of the RWM, \Pr[\sigma | Y],
which we computed above.
\displaystyle \Pr[\sigma_v = 1 | Y, \{\sigma_u\}_{u \neq v}] = \frac{ \prod_u \exp(\lambda Y_{u, v} \sigma_u) - \prod_u \exp(-\lambda Y_{u, v} \sigma_u) }{ \prod_u \exp(\lambda Y_{u, v} \sigma_u) + \prod_u \exp(-\lambda Y_{u, v} \sigma_u) }

And similarly for \Pr[\sigma_v = -1 | Y, \{\sigma_u\}_{u \neq v}].
From the above, we can take expectations over \sigma_u, and express
{{\mathbb{E}}}[\sigma_v] in terms of
\{{{\mathbb{E}}}[\sigma_u]\}_{u \neq v}. Doing this (and
using the heuristic assumption that the distribution of \sigma is a
product distribution), we find that the BP state update can be written
m^{t}_{u \to v} = f(\sum_{w \neq v}f^{-1}(A_{w, u} m^{t - 1}_{w \to u}))
where the interaction matrix A_{w, u} = \lambda Y_{w, u}, and
f(x) = tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(x)}.

Now, Taylor expanding f^{-1} around 0, we find
m^{t}_{u \to v} = f\left( (\sum_{w \neq v} A_{w, u} m^{t - 1}_{w \to u}) + O(1/\sqrt{n}) \right)
since the terms A_{w, u} are of order O(1/\sqrt{n}).

At this point, we could try dropping the “non-backtracking” condition
w \neq v from the above sum (since the node v contributes at most
O(1/\sqrt{n}) to the sum anyway), to get the state update:
m^{t}_{u} = f\left( \sum_{w} A_{w, u} m^{t - 1}_{w}) \right) (note the messages no longer
depend on receiver – so we write m_u in place of m_{u \to v}).
However, this simplification turns out not to work for estimating the
signal. The problem is that the “backtracking” terms which we added
amplify over two iterations.

In AMP, we simply perform the above procedure, except we add a
correction term to account for the backtracking issue above. Given u,
for all v, the AMP update is:
m^{t}_{u \to v} = m^{t}_u = f(\sum_{w}A_{w, u} m^{t - 1}_{w}) + [\text{some correction term}]

The correction term corresponds to error introduced by the backtracking
terms. Suppose everything is good until step t - 2. We will examine
the influence of backtracking term to a node v through length 2 loops.
At time t - 1, v exert Y_{v, u}m^{t - 2}_v additional influence to
each of it’s neighbor u. At time t, v receive roughly
Y_{u, v}^2m^{t - 2}_v. Since Y_{u, v}^2 has magnitude
\approx \frac{1}{n} and we need to sum over all of v’s neighbors,
this error term is to large to ignore. To characterize the exact form of
correction, we simply do a taylor expansion

m^{t}_v = \sum_{u}f(Y_{u, v}m^{t - 1}_u) = \sum_{u}f(Y_{u, v} \left(\sum_{w}f(Y_{w, u}m^{t - 2}_w) - f(Y_{u, v}m^{t - 2}_w)\right) )\\ \approx \sum_u f(Y_{u, v} m^{t - 1}_u) - Y_{u, v}f'(m^{t - 1}_u)m^{t - 2}_v\\ \approx \sum_u f(Y_{u, v} m^{t - 1}_u) - \frac{1}{n}\sum_{u}f'(m^{t - 1}_u)m^{t - 2}_v

State evolution of AMP

In this section we attempt to obtain the phase transition of Rademacher
spiked Wigner model via looking at m^{\infty}.

We assume that each message could be written as a sum of signal term and
noise term. m^t = \mu_t x + \sigma_t g where
g \sim \mathbb{N}(0, I). To the dynamics of AMP (and find its phase
transition), we need to look at how the signal \mu_t and noise
\sigma_t evolves with t.

We do the following simplification: ignore the correction term and
assume each time we obtain an independent noise g.

m^{t} = Yf(m^{t - 1}) = (\frac{\lambda}{n}x^Tx + \frac{1}{\sqrt{n}}W)f(m^{t - 1}) = \frac{\lambda}{n} < f(m^{t - 1}), x > x + \frac{1}{\sqrt{n}} Wf(m^{t - 1})

Here, we see that \mu_t = \frac{\lambda}{n}< f(m^{t - 1}), x>
and \sigma_t = \frac{1}{\sqrt{n}}Wf(m^{t - 1}).

Note that \mu_{t} is essentially proportional to overlap between
ground truth and current belief
, since the function f keeps the
magnitude of the current beliefs bounded.

\frac{\lambda}{n} <f(m^{t - 1}), x>= \frac{\lambda}{n} <f(\mu_{t - 1}x + \sigma_{t - 1}g), x> \approx\lambda {{\mathbb{E}}}_{X \sim unif(\pm 1), G\sim \mathbb{N}(0, 1)}[X f(\mu_{t - 1}X + \sigma_{t - 1}G)] = \lambda {{\mathbb{E}}}_G[f(\mu_{t - 1} + \sigma_{t - 1}G)]

For the noise term, each coordinate of \sigma_t is a gaussian random
variable with 0 mean and variance

\frac{1}{n} \sum_v f(m^{t - 1})_v^2 \approx {{\mathbb{E}}}_{X, G}[f(\mu_{t - 1}X + \sigma_{t - 1}G)^2] = {{\mathbb{E}}}_{G}[f(\mu_{t - 1} + \sigma_{t - 1}G)^2]

It was shown in [4] that we can introduce a new
parameter \gamma_t s.t.
\gamma_t = \lambda^2 {{\mathbb{E}}}[f(\gamma_{t - 1} + \sqrt{\gamma_{t - 1}}G)]
As t \to \infty, turns out \mu_t = \frac{\gamma_t}{\lambda} and
\sigma_t^2 = \frac{\sigma_t}{\lambda^2}. To study the behavior of
m^t as t \to \infty, it is enough to track the evolution of

This heuristic analysis of AMP actually gives a phase transition at
\lambda = 1 (in fact, the analysis of AMP can be done rigorously as in [5]):

  • For \lambda < 1: If \gamma_t \approx 0, |\gamma_t + \sqrt{\gamma_t}G| < 1 w.h.p., thus we have \gamma_{t + 1} \approx \lambda^2 (\gamma_t) < \gamma_t. Taking t \to \infty, we have \gamma_{\infty} = 0, which means there AMP solution has no overlap with the ground truth.

  • For \lambda > 1: In this case, AMP’s solution has some correlation with the ground truth.

screenshot 2019-01-26 13.49.39

(Figure from [6])

Replica symmetry trick

Another way of obtaining the phase transition is via a non-rigorous
analytic method called the replica method. Although non-rigorous, this
method from statistical physics has been used to predict the fixed point
of many message passing algorithms and has the advantage of being easy
to simulate. In our case, we will see that we obtain the same phase
transition temperature as AMP above. The method is non-rigorous due to
several assumptions made during the computation.

Outline of replica method

Recall that we are interested in minizing the free energy of a given
system f(\beta, Y) = \frac{1}{\beta n} \log Z(\beta, Y) where Z is
the partition function as before:
Z(\beta, Y) = \sum_{x \in \{\pm 1\}^n} exp(-\beta H(Y, x)) and
H(Y, x) = -<Y, x^Tx> = -xYx^T = -\sum_{i, j} Y_{i, j}x_ix_j.

In replica method, Y is not fixed but a random variable. The
assumption is that as n \to \infty, free energy doesn’t vary with Y
too much, so we will look at the mean of f_Y to approximate free
energy of the system.

f(\beta) = \lim_{n \to \infty}\frac{1}{\beta n}{{\mathbb{E}}}_{Y}[\log Z(\beta, Y)]

f(\beta) is called the free energy density and the goal now is to
compute the free energy density as a function of only \beta , the
temperature of the system.

The replica method is first proposed as a simplification of the
computation of f(\beta)

It is a generally hard problem to compute f(\beta) in a clear way. A
naive attempt of approximate f(\beta) is to simply pull the log out
g(\beta) = \frac{1}{\beta n}\log {{\mathbb{E}}}_Y[Z(\beta, Y)]
Unfortunately g(\beta) and f(\beta) are quite different quantities,
at least when temperature is low. Intuitively, f(\beta) is looking at
system with a fixed Y while in g(\beta), x and Y are allowed to
fluctuate together. When the temperature is high, Y doesn’t play a big
roll in system thus they could be close. However, when temperature is
low, there could be a problems. Let \beta \to \infty,
f(\beta) \approx \int_Y (\beta x_Y Y x_Y)\mu(Y) dY,
g(\beta) \approx \log \int_Y exp(\beta x_J Y x_Y)\mu(Y)dY \approx \beta x^* Yx^*.

While {{\mathbb{E}}}_X[\log(f(X))] is hard to compute,
{{\mathbb{E}}}[f(X)^r] is a much easier quantity. The
replica trick starts from rewriting f(\beta) with moments of Z:
Recall that x^r \approx 1 + r \log x for r \approx 0 and
\ln(1 + x)\approx x, using this we can rewrite f(x) in the following

Claim 1. Let f_r(\beta) = \frac{1}{r \beta n}\ln[{{\mathbb{E}}}_Y[Z(\beta, Y)^r]]
Then, f(\beta) = \lim_{r \to 0}f_r(\beta)

The idea of replica method is quite simple

  • Define a function f(r, \beta) for r \in \mathbb{Z}_+ s.t. f(r, \beta) = f_r(\beta) for all such r.

  • Extend f(r, \beta) analytically to all r \in {{\mathbb{R}}}_+ and take the limit of r \to 0.

The second step may sound crazy, but for some unexplained reason, it has
been surprisingly effective at making correct predictions.

The term replica comes from the way used to compute
{{\mathbb{E}}}[Z^r] in Claim 1. We expand the r-th moment
in terms of r replicas of the system

Z(\beta, Y)^r = (\sum_x exp(-\beta H(Y, x)))^r = \sum_{x^1, \cdots, x^r} \Pi_{k = 1}^r exp(-\beta H(Y, x^i))

For Rademacher spiked Wigner model

In this section, we will see how one can apply the replica trick to
obtain phase transition in the Rademacher spiked Wigner model. Recall
that given a hidden a \in \{\pm 1\}^n, the observable
Y = \frac{\lambda}{n}a^Ta + \frac{1}{\sqrt n} W where
W_{i, j} \sim \mathcal{N}(0, 1) and W_{i, i} \sim \mathcal{N}(0, 2).
We are interested in finding the smallest \lambda where we can still
recover a solution with some correlation to the ground truth a. Note
that \{W_{i, i}\} is not so important here as x_i^2 doesn’t carry
any information in this case.

Given by the posterior {{\mathbb{P}}}[x|Y], the system we
set up corresponding to Rademacher spiked Wigner model is the following:

  • the system consists of n particles and the interactions between
    each particle are give by Y

  • the signal to noise ratio \lambda as the inverse temperature

Following the steps above, we begin by computing
f(r, \beta) = \frac{1}{r\beta n}\ln{{\mathbb{E}}}_Y[Z^r]
for r \in \mathbb{Z}_+: Denote X^k = (x^k)^Tx^k where x^k is the
kth replica of the system.

{{\mathbb{E}}}_Y[Z^r] = \int_Y \sum_{x^1, \cdots, x^r} exp(\beta \sum_k <Y, X^k> \mu(Y) dY\\ = \int_Y \sum_{x^1, \cdots, x^r} exp(\beta <Y, \sum_k X^k>) \mu(Y) dY

We then simplify the above expression with a technical claim.

Claim 2. Let Y = A + \frac{1}{\sqrt{n}}W where A is a fixed matrix and
W is the GOE matrix defined as above. Then,
\int_Y exp(\beta<Y, X>) \mu(Y) dY = exp(\frac{\beta^2}{n}{{\|{X}\|_{F}}}^2 + \frac{\beta}{2} <A, X>)
for some constant C depending on distribution of Y.

Denote X = \sum_k X^k. Apply Claim 2 with
A = \frac{\beta}{n}a^Ta, we have
{{\mathbb{E}}}_Y[Z^r] = \sum_{x^1, \cdots, x^r} exp(\frac{\beta^2}{n}{{\|{X}\|_{F}}}^2 + \frac{\beta^2}{2n} <a^Ta, X>)
To understand the term inside exponent better, we can rewrite the inner
sum in terms of overlap between replicas:

{{\|{X}\|_{F}}}^2 = \sum_{i, j}X_{i, j}^2 = \sum_{i, j}(\sum_{k = 1}^r x^k_ix^k_j)^2 =\sum_{i, j}(\sum_{k = 1}^r x^k_ix^k_j)(\sum_{l = 1}^r x^l_ix^l_j)\\ = \sum_{k, l} (\sum_{i = 1}^n x^k_ix^{l}_i)^2 = \sum_{k, l} <x^k, x^l>^2

where the last equality follows from rearranging and switch the inner
and outer summations.

Using a similar trick, we can view the other term as

<a^Ta, X> = \sum_{i, j}\sum_{k = 1}^rx^k_ix^k_ja_ia_j = \sum_{k = 1}^r (\sum_{i = 1}^n a_ix^k_i)^2 = \sum_{k}<a, x^k>^2

Note that Q_{k, l} = <x^k, x^l> represents overlaps between the
k and lth replicas and Q_k = <a, x^k> represents the
overlaps between the kth replica and the ground truth vector.

In the end, we get for any integer r, (Equation 1):

\displaystyle f(r, \beta) = \frac{1}{r\beta n}\ln(\sum_{x^1, \cdots, x^r} exp(\frac{\beta^2}{n}\sum_{k, l}Q_{k, l}^2 + \frac{\beta^2}{2n}\sum_k Q_k^2)) \label{e:1}\\ = \frac{1}{r\beta n} \ln(\sum_{Q}\nu_{x^k}(Q)exp(\frac{\beta^2}{n}\sum_{k, l}Q_{k, l}^2 + \frac{\beta^2}{2n}\sum_k Q_k^2))

Our goal becomes to approximate this quantity. Intuitively, if we think
of Q_{k, l} as indices on a (r + 1) \times (r + 1) matrices, Q,
with Q(i,i) = 1, then Q is the average of n i.i.d matrices. So we
expect Q_{j, k} \in [\pm \frac{1}{n}] for j \neq k w.h.p. In the
remaining part, We find the correct Q via rewriting Equation 1.

Observe that by introducing a new variable Z_{k, l} for k \neq l and
using the property of gaussian intergal (Equation 4):

\label{e:4} exp(\frac{\beta^2}{n}Q_{k, l}^2) = \sqrt{\frac{n}{4\pi}}\int_{Z_{k, l}} exp(-\frac{n}{4}Z_{k, l}^2 + \beta Q_{k, l}Z_{k, l})dZ_k

\exp(\frac{\beta^2}{2n}Q_k^2) = \sqrt{\frac{1}{8\pi n}}\int_{Z_k}exp(-(2n)Z_k^2 + 2\beta Q_{k}Z_k)dZ_k
Replace each exp(\frac{\beta^2}{n}Q_{k, l}^2) by a such integral, we
have (Equation 2):

\begin{gathered} {{\mathbb{E}}}[Z^r] = \sum_{x^1, \cdots, x^r} exp(\frac{\beta^2}{n}\sum_{k, l}Q_{k, l}^2 + \frac{\beta^2}{2n}\sum_k Q_k^2) \label{e:2}\\ = C\sum_{x^1, \cdots, x^r} \exp(\beta^2 n)\int_{Z_{k, l}}exp(-\frac{n}{4}\sum_{k \neq l}Z_{k, l}^2 - \frac{n}{2}\sum_k Z_k^2 + \beta \sum_{k \neq l}Y_{k, l}Q_{k, l} + 2\beta\sum_kZ_k Q_k) dZ \\ =C\exp(\beta^n) \int_{Y_{k, l}}exp(-\frac{n}{4}\sum_{k \neq l}Y_{k, l}^2 - \frac{n}{2}\sum_k Z_k^2 + \ln(\sum_{x_1,\cdots, x_r}exp(\beta \sum_{k\neq l}Y_{k, l}Q_{k, l} + 2\beta\sum_kY_k Q_k)) dY \label{e:2}\end{gathered}

where C is the constant given by introducing gaussian intergals.

To compute the integral in (Equation 2), we need to cheat a little bit and take
n \to \infty before letting r \to 0. Note that free energy density
is defined as
f(\beta) = \lim_{n \to \infty}\lim_{r \to 0}\frac{1}{r\beta n}\ln {{\mathbb{E}}}_Y[Z(\beta, Y)^r]
This is the second assumption made in the replica method and it is
commonly believed that switching the order is okay here. Physically,
this is plausible because we believe intrinsic physical quantities
should not depend on the system size.

Now the Laplace method tells us when n \to \infty, the integral in (Equation 2) is dominated by the max of the exponent.

Theorem 1 (Laplace Method). Let h(x): {{\mathbb{R}}}^n \to {{\mathbb{R}}}then

\int e^{nh(x)} \approx e^{nh(x^*)}(\frac{2\pi}{n})^{\frac{d}{2}}\frac{1}{\sqrt{det(H)}}

where x^* = argmax_x \{h(x)\} and H is the Hessian of h evaluated at the point x^*.

Fix a pair of k, l and apply Laplace method with
h(Z_{k, l}) = -\frac{1}{2}\sum_{0 \leq k < l \leq r}Z_{k, l}^2 + \frac{1}{n}\ln(\sum_{x_1,\cdots, x_r}exp(\beta \sum_{k \neq l}Z_{k, l}Q_{k, l} + 2\beta\sum_kZ_k Q_k))
what’s left to do is to find the critical point of h. Taking the
derivatives gives
-Y_{k, l} + \frac{A(Z_{k, l})\beta Q_{k, l}}{n A(Z_{k, l})} = 0
A(Z_{k, l}) = \sum_{x_1,\cdots, x_r}exp(\beta \sum_{k \neq l}Z_{k, l}Q_{k, l} + \beta\sum_kY_k Q_k).

We now need to find a saddle point of h where the hessian is PSD. To
do that, we choose to assume the order of the replicas does not matter,
which is refer to as the replica symmetry case. 1 One simplest form
of Y is the following: \forall k, l > 0, Z_{k, l} = y and
Z_{k} = y for some y. This also implies that Q_{k, l} = q for some
q and y =\frac{\beta}{n} q

Plug this back in to Equation 2 gives: (Equation 3)

\label{e:3} {{\mathbb{E}}}[Z^r] = C\exp(\beta n)\exp(-\frac{n}{2}(\frac{r^2 - r}{2})y^2 - \frac{n^2}{2} + \ln(\sum_{x^i}\exp(y\beta\sum_{k \neq l}Q_{k, l} + 2y\beta \sum_k Q_k))

To obtain f(r, \beta), we only need to deal with the last term in
(Equation 3) as r \to 0. Using the fact that Q_{k, l} = y for all
k, l and using the same trick of introducing new gaussain integral as
in (Equation 4) we have
\lim_{r \to 0}\frac{1}{r}\ln(\sum_{x^i}\exp(y\beta\sum_{k \neq l}Q_{k, l} + n\beta \sum_k Q_k)) = -\beta + {{\mathbb{E}}}_{z \sim \mathcal{N}(0, 1)}[\log(2cosh(y\beta + \sqrt{y\beta}z))]

Using the fact that we want the solution to minimizes free energy,
taking the derivative of the current f w.r.t. y gives
\frac{y}{\beta} = n{{\mathbb{E}}}_z[tanh(y\beta + \sqrt{y\beta}z)]
which matches the fixed point of AMP. Plug in q and y will give us
f(\beta). The curve of f(\beta) looks like the Figure below, where
the solid line is the curve of f(\beta) with the given q and the
dotted line is the curve given by setting all variables 0.

screenshot 2019-01-26 13.54.49



[1] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Optimality and sub-optimality of pca for spiked random matrices and synchronization.
arXiv preprint arXiv:1609.05573, 2016.
[2] D. Feral and S. Pech e. The Largest Eigenvalue of Rank One Deformation of Large Wigner Matrices. Communications in Mathematical Physics, 272:185–228, May 2007.
[3] Afonso S Bandeira, Amelia Perry, and Alexander S Wein. Notes on computational-to-statistical gaps: predictions using statistical physics. arXiv preprint arXiv:1803.11132, 2018.
[4] Yash Deshpande, Emmanuel Abbe, and Andrea Montanari. Asymptotic mutual information for thebinary stochastic block model. In
Information Theory (ISIT), 2016 IEEE International Symposium on, pages 185–189. IEEE, 2016.
[5] Adel Javanmard and Andrea Montanari. State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference: A Journal of the IMA, 2(2):115–144, 2013.
[6] A. Perry, A. S. Wein, and A. S. Bandeira. Statistical limits of spiked tensor models.
ArXiv e-prints, December 2016.

  1. Turns out for this problem, replica symmetry is the only case. We
    will not talk about replica symmetry breaking here, which
    intuitively means we partition replicas into groups and re-curse.