Statistical physics has deep connections to many computational problems, including statistical inference, counting and sampling, and optimization. Perhaps especially compelling are the field’s insights and intuitions regarding “average-case complexity” and information-computation gaps. These are topics for which the traditional theoretical CS approaches seem ill-suited, while on the other hand statistical physics has supplied a rich (albeit not always mathematically rigorous) theory.

Statistical physics is the first topic in the seminar course I am co-teaching with Boaz this fall, and one of our primary goals is to explore this theory. This blog post is a re-working of a lecture I gave in class this past Friday. It is meant to serve as an introduction to statistical physics, and is composed of two parts: in the first part, I introduce the basic concepts from statistical physics in a hands-on manner, by demonstrating a phase transition for the Ising model on the complete graph. In the second part, I introduce random k-SAT and the satisfiability conjecture, and give some moment-method based proofs of bounds on the satisfiability threshold.

Update on September 16, 3:48pm: the first version of this post contained an incorrect plot of the energy density of the Ising model on the complete graph, which I have amended below.

## I. From particle interactions to macroscopic behaviors

In statistical physics, the goal is to understand how materials behave on a macroscopic scale based on a simple model of particle-particle interactions.

For example, consider a block of iron. In a block of iron, we have many iron particles, and each has a net $\{\pm 1\}$-polarization or “spin” which is induced by the quantum spins of its unpaired electrons. On the microscopic scale, nearby iron atoms “want” to have the same spin. From what I was able to gather on Wikipedia, this is because the unpaired electrons in the distinct iron atoms repel each other, and if two nearby iron atoms have the same spins, then this allows them to be in a physical configuration where the atoms are further apart in space, which results in a lower energy state (because of the repulsion between electrons).

When most of the particles in a block of iron have correlated spins, then on a macroscopic scale we observe this correlation as the phenomenon of magnetism (or ferromagnetism if we want to be technically correct).

In the 1890’s, Pierre Curie showed that if you heat up a block of iron (introducing energy into the system), it eventually loses its magnetization. In fact, magnetization exhibits a phase transition: there is a critical temperature, $T_c$, below which a block of iron will act as a magnet, and above which it will suddenly lose its magnetism. This is called the “Curie temperature”. This phase transition is in contrast to the alternative, in which the iron would gradually lose its magnetization as it is heated.

### The Ising Model

We’ll now set up a simple model of the microscopic particle-particle interactions, and see how the global phenomenon of the magnetization phase transition emerges. This is called the Ising model, and it is one of the more canonical models in statistical physics.

Suppse that we have $n$ iron atoms, and that their interactions are described by the (for simplicity unweighted) graph $G$ with adjacency matrix $A_G$. For example, we may think of the atoms as being arranged in a 3D cubic lattice, and then $G$ would be the 3D cubic lattice graph. We give each atom a label in $[n] = \{1,\ldots,n\}$, and we associate with each atom $i \in [n]$ a spin $x_i \in \{\pm 1\}$.

For each choice of spins or state $x \in \{\pm 1\}^n$ we associate the total energy

$E(x) = - x^\top A_G x= \sum_{(i,j) \in G} - x_i x_j$.

If two interacting particles have the same spin, then they are in a “lower energy” configuration, and then they contribute $-1$ to the total energy. If two neighboring particles have opposite spins, then they are in a “higher energy” configuration, and they contribute $+1$ to the total energy.

We also introduce a temperature parameter $T$. At each $T$, we want to describe what a “typical” configuration for our block of iron looks like. When $T=0$, there is no kinetic energy in the system, so we expect the system to be in the lowest-energy state, i.e. all atoms have the same spin. As the temperature increases, the kinetic energy also increases, and we will begin to see more anomalies.

In statistical physics, the “description” takes the form of a probability distribution over states $x$. To this end we define the Boltzmann distribution, with density function $P_{\beta}(x)$:

$P_{\beta}(x) \propto \exp(-\beta E(x)), \qquad \text{where } \beta = \frac{1}{T}.$

As $T\to 0$, $\beta\to\infty$, $P_{\beta}(x)$ becomes supported entirely on the $x$ that minimize $E(x)$; we call these the ground states (for connected $G$ these are exactly $x = \pm \vec{1}$). On the other hand as $T \to \infty, \beta \to 0$, all $x \in \{\pm 1\}^n$ are weighted equally according to $P_{\beta}$.

Above we have defined the Boltzmann distribution to be proportional to $\exp(-\beta E(x))$. To spell it out,

$P_{\beta}(x) = \frac{1}{Z(\beta)} \exp(-\beta E(x)), \qquad \text{where } Z(\beta) = \sum_{x \in \{\pm 1\}^n} \exp(-\beta E(x)).$

The normalizing quantity $Z(\beta)$ is referred to as the partition function, and is interesting in its own right. For example, from $Z(\beta)$ we can compute the free energy $F(\beta)$ of the system, as well as the internal energy $U(\beta)$ and the entropy $S(\beta)$:

$F(\beta) = -\frac{1}{\beta} \ln Z(\beta), \qquad \qquad U(\beta) = \frac{\partial}{\partial \beta} (\beta F(\beta)), \qquad \qquad S(\beta) = \beta^2 \frac{\partial}{\partial \beta} F(\beta).$

Using some straightforward calculus, we can then see that $S(\beta)$ is the Shannon entropy,

$S(\beta) = - \sum_{x} P_{\beta}(x)\ln (P_{\beta}(x)),$

that $U(\beta)$ is the average energy in the system,

$U(\beta) = \sum_{x} P_{\beta}(x) \cdot E(x),$

and that the free energy is the difference of the internal energy and the product of the temperature $T = \frac{1}{\beta}$ and the entropy,

$F(\beta) = U(\beta) - T \cdot S(\beta),$

just like the classical thermodynamic definitons!

### Why these functions?

The free energy, internal energy, and entropy encode information about the typical behavior of the system at temperature $\beta$. We can get some intuition by considering the extremes, $\beta \to \infty$ and $\beta \to 0$.

In cold systems with $\beta \to \infty$, if we let $E_0$ be the energy of the ground state, $N_0 = |\{x : E(x) = E_0\}|$ be the number of ground state configurations, and $\Delta_E = \min_{x : E(x) \textgreater E_0} E(x) - E_0$ be the energy gap, then

$Z(\beta) = N_0 \exp(-\beta E_0)\cdot\left(1 + O\left(\exp(-\beta \Delta_E)\right)\right),$

where the $O(\cdot)$ notation hides factors that do not depend on $\beta$. From this it isn’t hard to work out that

\begin{aligned} F(\beta) &= E_0 -\frac{1}{\beta} \ln(N_0) + O\left(\exp(-\beta \Delta_E)\right),\\ E(\beta) &= E_0 + O\left(\exp(-\beta \Delta_E)\right)\\ S(\beta) &= \ln N_0 + O\left(\exp(-\beta \Delta_E)\right). \end{aligned}

We can see that the behavior of the system is dominated by the few ground states. As $\beta \to \infty$, all of the free energy can be attributed to the internal energy term.

On the other hand, as $\beta \to 0$,

\begin{aligned} F(\beta) &= \mathbb{E}_{x\sim\{\pm 1\}^n} E(x) - \frac{n}{\beta} + O(\beta),\\ U(\beta) &= \mathbb{E}_{x\sim\{\pm 1\}^n} E(x) + O(\beta),\\ S(\beta) &= n + O(\beta), \end{aligned}

and the behavior of the system is chaotic, with the free energy dominated by the entropy term.

### Phase transition at the critical temperature.

We say that the system undergoes a phase transition at $\beta_c$ if the energy density $f(\beta) = \lim_{n\to\infty}\frac{1}{n}F(\beta)$ is not analytic at $\beta = \beta_c$. Often, this comes from a shift in the relative contributions of the internal energy and entropy terms to $F(\beta)$. Phase transitions are often associated as well with a qualitative change in system behavior.

For example, we’ll now show that for the Ising model with $G = K_n$ the complete graph with self loops, the system has a phase transition at $\beta_c = \frac{1}{2}$ (the self-loops don’t make much physical sense, but are convenient to work with). Furthermore, we’ll show that this phase transition corresponds to a qualitative change in the system, i.e. the loss of magnetism.

Define the magnetization of the system with spins $x$ to be $m(x) = \frac{1}{n}\sum_i x_i$. If $\lim_{n \to \infty} m(x) \not\to 0$, then we say the system is magnetized.

In the complete graph, normalized so that the total interaction of each particle is $1$, there is a direct relationship between the energy and the magnetization:

$E(x) = \frac{1}{n}\sum_{i, j} - x_i x_j = - n\cdot m(x)^2.$

### Computing the energy density.

The magnetization takes values $\frac{k}{n}$ for $k\in\{-n,\ldots, n\}$. So, letting $N_k$ be the number of states with magnetization $k/n$, we have that

$Z(\beta) = \sum_{x} \exp(-\beta(- n\cdot m(x)^2)) = \sum_{k = -n}^{n} N_k \cdot \exp\left(\beta n \left(\frac{k}{n}\right)^2\right).$

Now, $N_k$ is just the number of strings with Hamming weight $\frac{1}{2}(n+k)$, so $N_k = \binom{n}{(n+k)/2}$. By Stirling’s approximation $N_k \approx \exp(n \cdot H(\frac{1+k/n}{2}))$, where $H$ is the entropy function, so up to lower-order terms

$Z(\beta) \approx \sum_{\substack{k\in [\pm n]\\ k+n \equiv_2 0}} \exp\left( \beta n \left(\frac{k}{n}\right)^2 + H\left(\frac{1 + \frac{k}{n}}{2}\right) n\right).$

Now we apply the following simplification: for $x \in \mathbb{R}^n$, $\|x\|_{\infty} \le \|x\|_1 \le n \|x\|_{\infty}$, and then $\log(\|x\|_1) = \log \|x\|_{\infty} + O(\log n)$. Treating our summands as the entries of the vector $x$, from this we have,

$\log(Z(\beta)) = \max_{k \in \{-n,\ldots, n\}} \beta n \left(\frac{k}{n}\right)^2 + H\left(\frac{1}{2}\left(1+\frac{k}{n}\right)\cdot n\right) + O\left(\log n\right).$

By definition of the energy density,

$f(\beta) = \lim_{n \to \infty} \left(- \frac{1}{\beta n} \log Z(\beta)\right) = -\lim_{n\to\infty} \max_{k \in \{-n,\ldots, n\}} \left(\frac{k}{n}\right)^2 + \frac{1}{\beta}H\left(\frac{1}{2}(1+\frac{k}{n})\right),$

and since $n \to \infty$ independently of $\beta$, we also have

$f(\beta) = -\left(\max_{\delta \in [-1,1]} \delta^2 + \frac{1}{\beta}H\left(\frac{1+\delta}{2}\right)\right),$

because the error from rounding $\delta \in [-1,1]$ to the nearest factor of $2/n$ is $O(1/n)$.

We can see that the first term in the expression for $f(\beta)$ corresponds to the square of the magnetization (and therefore the energy); the more magnetized the system is, the larger the contribution from the first term. The second term corresponds to the entropy, or the number of configurations in the support; the larger the support, the larger the contribution of the second term. As $T = \frac{1}{\beta}\to \infty$, the contribution of the entropy term overwhelms the contribution of the energy term; this is consistent with our physical intuition.

### A phase transition.

We’ll now demonstrate that there is indeed a phase transition in $\beta$. To do so, we solve for this maximum. Taking the derivative with respect to $\delta$, we have that

$\frac{\partial}{\partial \delta} \left( \delta^2 + \frac{1}{\beta}H\left(\frac{1+\delta}{2}\right)\right) = 2\delta + \frac{1}{2\beta} \ln \left(\frac{1-\delta}{1+\delta}\right),$

so the derivative is $0$ whenever $\delta = \frac{1}{4\beta} \ln\left(\frac{1+\delta}{1-\delta}\right)$. From this, we can check the maxima. When $\beta \textgreater \frac{1}{2}$, there are two maxima equidistant from the origin, corresponding to negatively or positively-magnetized states. When $\beta \textless \frac{1}{2}$, the maximizer is $0$, corresponding to an unmagnetized state.

Given the maximizers, we now have the energy density. When we plot the energy density, the phase transition at $\beta = \frac{1}{2}$ is subtle (an earlier version of this post contained a mistaken plot):

But when we plot the derivative, we can see that it is not smooth at $\beta = \frac{1}{2}$:

And with some calculus it is possible to show that the second derivative is indeed not continuous at $\beta = \frac{1}{2}$.

Qualitatively, it is convincing that this phase transition in the energy density is related to a transition in the magnetization (because the maximizing $\delta$ corresponds to the typical magnetization). One can make this formal by performing a similar calculation to show that the internal energy undergoes a phase transition, which in this case is proportional to the expected squared magnetization, $\mathbb{E}_{P_\beta}[-n \cdot m(x)^2]$.

### Beyond the complete graph

The Ising model on the complete graph $K_n$ (also called the Curie-Weiss model) is perhaps not a very convincing model for a physical block of iron; we expect that locality should govern the strength of the interactions. But because the energy and the magnetization are related so simply, it is easy to solve.

Solutions are also known for the 1D and 2D grids; solving it on higher-dimensional lattices, as well as in many other interesting settings, remains open. Interestingly, the conformal bootstrap method that Boaz mentioned has been used towards solving the Ising model on higher-dimensional grids.

## II. Constraint Satisfaction Problems

For those familiar with constraint satisfaction problems (CSPs), it may have already been clear that the Ising model is a CSP. The spins $x \in \{\pm 1\}^n$ are Boolean variables, and the energy function $E(x)$ is an objective function corresponding to the EQUALITY CSP on $G$ (a pretty boring CSP, when taken without negations). The Boltzmann distribution gives a probability distribution over assignments to the variables $x$, and the temperature $T$ determines the objective value of a typical $x \sim P_{\beta}$.

We can similarly define the energy, Boltzmann distribution, and free energy/entropy for any CSP (and even to continuous domains such as $x \in \mathbb{R}^n$). Especially popular with statistical physicists are:

• CSPs (such as the Ising model) over grids and other lattices.
• Gaussian spin glasses: CSPs over $x \in \{\pm 1\}^n$ or $x \in \mathbb{R}^n$ where the energy function is proportional to $\langle x^{\otimes k}, J\rangle$, where $J$ is an order-$k$ symmetric tensor with entries sampled i.i.d. from $\mathcal{N}(0,1)$. The Ising model on a graph with random Gaussian weights is an example for $k = 2$.
• Random instances of $k$-SAT: Of the $\binom{n}{k} \cdot 2^k$ possible clauses (with negations) on $n$ variables, $m$ clauses $C_1,\ldots,C_m$ are sampled uniformly at random, and the energy function is the number of satisfied clauses, $E(x) = |\{ i \in [m] : C_i(x) = 1\}|$.
• Random instances of other Boolean and larger-alphabet CSPs.

In some cases, these CSPs are reasonable models for physical systems; in
other cases, they are primarily of theoretical interest.

### Algorithmic questions

As theoretical computer scientists, we are used to seeing CSPs in the contex of optimization. In statistical physics, the goal is to understand the qualitative behavior of the system as described by the Boltzmann distribution. They ask algorithmic questions such as:

• Can we estimate the free energy $F(\beta)$? The partition function $Z(\beta)$? And, relatedly,
• Can we sample from $P_{\beta}$?

But these tasks are not so different from optimization. For example, if our system is an instance of 3SAT, when $T=0$, the Boltzmann distribution is the uniform distribution over maximally satisfying assignments, and so estimating $Z(\infty)$ is equivalent to deciding the SAT formula, and sampling from $P_{\infty}$ is equivalent to solving the SAT formula. As $T$ increases, sampling from $P_{\beta}$ corresponds to sampling an approximate solution.

Clearly in the worst case, these tasks are NP-Hard (and even #P-hard). But even for random instances these algorithmic questions are interesting.

### Phase transitions in satisfiability

In random $k$-SAT, the system is controlled not only by the temperature $T$ but also by the clause density $\alpha = \frac{m}{n}$. For the remainder of the post, we will focus on the zero-temperature regime $T = 0$, and we will see that $k$-SAT exhibits phase transitions in $\alpha$ as well.

The most natural “physical” trait to track in a $k$-SAT formula is whether or not it is satisfiable. When $\alpha = 0$, $k$-SAT instances are clearly satisfiable, because they have no constraints. Similarly when $\alpha = n^k \cdot 2^k$, random $k$-SAT instances cannot be satisfiable, because for any set of $k$ variables they will contain all $2^k$ possible $k$-SAT constraints (clearly unsatisfiable). It is natural to ask: is there a satisfiability phase transition in $\alpha$?

For $k = 2$, one can show that the answer is yes. For $k \ge 3$, numerical evidence strongly points to this being the case; further, the following theorem of Friedgut gives a partial answer:

Theorem:
For there exists a function $\alpha_c(n)$ such that for any $\epsilon \textgreater 0$, if $\phi$ is a random $k$-SAT formula on $n$ variables with $\alpha n$ clauses, then

$\lim_{n \to \infty} \Pr[\phi \text{ is satisfiable}] = \begin{cases} 1 & \text{ if } \alpha \textless (1-\epsilon) \alpha_c(n)\\ 0 & \text{ if } \alpha \textgreater (1+\epsilon) \alpha_c(n) \end{cases}.$

However, this theorem allows for the possibility that the threshold depends on $n$. From a statistical physics standpoint, this would be ridiculous, as it suggests that the behavior of the system depends on the number of particles that participate in it. We state the commonly held stronger conjecture:

Conjecture:
for all $k \ge 3$, there exists a constant $\alpha_c$ depending only on $k$ such that if $\phi$ is a random $k$-SAT instance in $n$ variables and $\alpha n$ clauses, then

$\lim_{n \to \infty} \Pr[\phi \text{ is satisfiable}] = \begin{cases} 1 & \text{ if } \alpha \textless \alpha_c \\ 1 & \text{ if } \alpha \textgreater \alpha_c \end{cases}$

In 2015, Jian Ding, Allan Sly, and Nike Sun established the $k$-SAT conjecture all $k$ larger than some fixed constant $k_0$, and we will be hearing about the proof from Nike later on in the course.

### Bounds on $\alpha_c$ via the method of moments

Let us move to a simpler model of random $k$-SAT formulas, which is a bit easier to work with and is a close approximation to our original model. Instead of sampling $k$-SAT clauses without replacement, we will sample them with replacement and also allow variables to appear multiple times in the same clause (so each literal is chosen uniformly at random). The independence of the clauses makes computations in this model simpler.

We’ll prove the following bounds. The upper bound is a fairly straightforward computation, and the lower bound is given by an elegant argument due to Achlioptas and Peres.

Theorem:
For every $k \ge 3$, $2^k \ln 2 - O(k) \le \alpha_c \le 2^k \ln 2 - O(1)$.

Let $\phi$ be a random $k$-SAT formula with $m = \alpha n$ clauses, $C_1,\ldots,C_m$. For an assignment $x \in \{\pm 1\}^n$, let $\phi(x)$ be the $0/1$ indicator that $x$ satisfies $\phi$. Finally, let $Z_{\alpha} = \sum_x \phi(x)$ be the number of satisfying assignments of $\phi$.

#### Upper bound via the first moment method.

We have by Markov’s inequality that

$\Pr[Z_{\alpha} \ge 1] \le \mathbb{E}[Z_{\alpha}] = \sum_{x} \Pr[\phi(x)].$

Fix an assignment $x$. Then by the independence of the clauses,

$\Pr[\phi(x)] = \prod_{j \in [m]} \Pr[C_j(x) = 1] = \left(1 - \frac{1}{2^k}\right)^m,$

since each clause has $2^k -1$ satisfying assignments. Summing over all
$x \in \{\pm 1\}^n$,

$\mathbb{E}[Z_{\alpha}] = 2^n \left(1 - \frac{1}{2^k}\right)^{\alpha n}.$

We can see that if $\alpha \textgreater 2^k \ln 2$, this quantity will go to $0$ with $n$. So we have:

$\alpha_c \le 2^k \ln 2.$

#### Lower bound via the second moment method.

To lower bound $\alpha_c$, we can use the second moment method. We’ll
calculate the second moment of $Z_{\alpha}$. An easy use of Cauchy-Schwarz (for non-negative $X$, $\mathbb{E}[X] \le \sqrt{\mathbb{E}[X^2]\mathbb{E}[I[X \textgreater 0]]}$) implies that if there is a constant $c$ such that

$\lim_{n \to \infty}\frac{\left(\mathbb{E} Z_{\alpha}\right)^2}{\mathbb{E}Z_{\alpha}^2} \ge c$,

then with at least constant probability $Z_{\alpha} \ge 1$, and then Friedgut’s theorem implies that $\alpha \textless \alpha_c$. From above we have an expression for the numerator, so we now set out to bound the second moment of $Z_{\alpha}$. We have that

$\mathbb{E} Z_{\alpha}^2 = \mathbb{E}\left[\left(\sum_{x}I[\phi(x)]\right)^2\right] = \sum_{x,y} \Pr\left[\phi(x)\wedge \phi(y)\right],$

and by the independence of the clauses,

$\Pr\left[\phi(x)\wedge \phi(y)\right] = \prod_{j \in [m]} \Pr[C_j(x) = 1 \wedge C_j(y) = 1] = \left(\Pr[C(x) = 1 \wedge C(y) = 1]\right)^m,$

for a single random $k$-SAT clause $C$. But, for $x,y \in \{\pm 1\}^n$, the events $C(x) = 1$ and $C(y)=1$ are not independent. This is easier to see if we apply inclusion-exclusion,

$\Pr[C(x) = 1 \wedge C(y) = 1] = 1 - \Pr[C(x) = 0] - \Pr[C(y) = 0] + \Pr[C(x) = C(y) = 0],$

The event that both $C(x) = C(y) = 0$ when $C$ is a $k$-SAT clause occurs only when $x$ and $y$ agree exactly on the assignments to the variables of $C$, since otherwise at least one of $x$ or $y$ must be satisfied (because each literal in $C(x)$ is negated for $C(y)$). Thus, this probability depends on the Hamming distance between $x$ and $y$.

For $x,y$ with $\frac{1}{2}\|x-y\|_1 = (1-\delta)n$, the probability that $C$‘s variables are all in the subset on which $x,y$ agree is $\delta^k$ (up to lower order terms). So, we have

$\Pr[C(x) = C(y) = 0] = \delta^k \cdot \frac{1}{2^k},$

and then for $x,y$ with $\frac{1}{2}\|x-y\|_1 = n - \ell$,

$\Pr[C(x) = 1 \wedge C(y) = 1] = 1 - 2\cdot \frac{1}{2^k} + \left(\frac{\ell}{2n}\right)^k.$

Because there are $2^n \cdot \binom{n}{\ell}$ pairs $x,y$ at distance $\frac{1}{2}\|x-y\|_1 = n - \ell$,

$\mathbb{E}\left[Z_{\alpha}^2\right] = \sum_{\ell = 0}^{n} 2^n \cdot \binom{n}{\ell} \cdot \left(1 - \frac{1}{2^{k-1}} + \left(\frac{\ell}{2n}\right)^k\right)^m,$

and using the Stirling bound $\binom{n}{\ell} \le O(\frac{1}{\sqrt{n}})\exp(n \cdot H(\ell/n))$,

$= \sum_{\ell = 0}^{n}O(\frac{1}{\sqrt{n}}) \exp\left( \left(H\left(\frac{\ell}{n}\right) + \ln 2\right)\cdot n + \ln\left(1 - \frac{1}{2^{k-1}} + \left(\frac{\ell}{2n}\right)^k\right) \cdot \alpha n\right).$

Using Laplace’s method, we can show that this sum will be dominated by the $O(\sqrt{n})$ terms around the maximizing summand; so defining $\psi(\delta) = H(\delta) + \ln 2 + \alpha\ln (1 - 2^{-k+1} + \delta^k 2^{-k}))$,

$\mathbb{E}[Z_{\alpha}^2] = O(1) \cdot \exp( n \cdot \max_{\delta \in [0,1]} \psi(\delta)).$

If we want that $\mathbb{E}[Z^2] \le c \cdot \mathbb{E}[Z]^2$, then we require that

$\max_{\delta \in [0,1]} \psi(\delta) \le \frac{1}{n}\ln(\mathbb{E}[Z]^2) = \ln 2 + 2\alpha \ln (1-2^{-k}).$

However, we can see (using calculus) that this inequality does not hold whenever $\alpha \textgreater 0$:

So the naive second moment method fails to establish any lower bound on $\alpha_c$. The problem here is that the second moment is dominated by the large correlations of close strings; whenever $\alpha \textgreater 0$, the sum is dominated by pairs of strings $x,y$ which are closer than $n/2$ in Hamming distance, which is atypical. For strings at Hamming distance $n/2$, what is relevant is $\psi(\frac{1}{2})$, which is always

$\psi(\frac{1}{2}) = H(\frac{1}{2}) + \ln 2 + \alpha \ln \left(1 - \frac{1}{2^{k-1}} + \frac{1}{2^{2k}}\right) = 2\ln 2 + \alpha \ln \left(1 - \frac{1}{2^k}\right)^{2},$

which is equal to $\frac{1}{n}\ln\mathbb{E}[Z_{\alpha}]^2$. At distance $n/2$, the value of $\phi$ on pairs $x,y$ is essentially uncorrelated (one can think of drawing a uniformly random $y$ given $x$), so such pairs are representative of $\mathbb{E}[Z]^2$.

#### A more delicate approach.

To get a good bound, we have to perform a fancier second moment method calculation, due to Achlioptas and Peres. We will re-weight the terms so that more typical pairs, at distance $n/2$, are dominant. Rather than computing $Z_{\alpha}$, we compute $X_{\alpha}$, where

$X_{\alpha} = \sum_{x} \prod_{j \in [m]} w_j(x),$

where $w_j(x) = 0$ if $C_j(x) = 0$, and $w_j(x) = \eta^t$ if $C_j(x)$ is satisfied by $t$ variables. Since $X_{\alpha} \textgreater 0$ only if $\phi$ has satisfying assignments, the goal is to still bound $\mathbb{E}[X_{\alpha}^2] \le c\cdot \mathbb{E}[X_{\alpha}]^2$. Again using the independence of the clauses, we have that

\begin{aligned} \mathbb{E}[X_{\alpha}] = 2^n \cdot \mathbb{E}[w_j(x)]^m = 2^n \left(2^{-k}\sum_{t = 1}^k \binom{k}{t} \eta^t \right)^m = 2^n \left(\left(\frac{1+\eta}{2}\right)^k-2^{-k}\right)^m.\label{eq:expect}\end{aligned}

And calculating again the second moment,

$\mathbb{E}[X_{\alpha}^2] = \sum_{x,y} \prod_{j \in [m]} \mathbb{E}[w_j(x)\cdot w_j(y)].$

For a fixed clause $j$, we can partition the variables in its scope into 4 sets, according to whether $x,y$ agree on the variable, and whether the variable does or does not satisfy the literals of the clause. Suppose that $w_j$ has $a$ variables on which $x,y$ agree, and that of these $t$ are satisfying for $w_j$ and $a - t$ are not.

Then, $w_j$ has $k-a$ variables on which $x,y$ disagree, and any variable which does not satisfy $w_j$ for $x$ must satisfy it for $y$. For a $w_j(x) w_j(y)$ to be nonzero, either there must be at least one literal in the variables on which $x,y$ agree which agrees with $x,y$, or otherwise there must be at least one literal in the variables on which $x,y$ disagree which agrees with each string. Therefore, if $x,y$ agree on a $\delta$-fraction of variables,

\begin{aligned} \mathbb{E}[w_j(x) \cdot w_j(y)] &= \sum_{a = 1}^k \sum_{t=1}^a \eta^{2t + k - a} \cdot \Pr[w_j \text{ has } a \text{ variables in } x \cap y,\ t \in x \cap y \text{ satisfy }w, \ w_j(x),w_j(y) \textgreater 0]\\ &= \left(\sum_{a = 0}^k\sum_{t=0}^a \eta^{k - a + 2t} \cdot \binom{k}{a}\delta^{a}\left(1 - \delta\right)^{k-a} 2^{-a} \binom{a}{t}\right) - \left(\sum_{a=0}^k \eta^{k-a} \binom{k}{a} \delta^a(1-\delta)^{k-a} 2^{-k+1}\right) + \delta^k2^{-k},\end{aligned}

where the first sum ignores the cases when $w_j(x) = 0$ or $w_j(y) = 0$, the second sum subtracts for the cases when $t=0$ the contribution of the terms where all the literals agree with either $x$ or $y$, and the final term accounts for the fact that the term $a = k$ is subtracted
twice. Simplifying,

$\mathbb{E}[w_j(x) \cdot w_j(y)] = \left(\frac{(1+\eta^2)}{2}\delta + \eta(1-\delta)\right)^k - 2^{1-k} \left(\eta(1-\delta) + \delta\right)^k + \left(\frac{\delta}{2}\right)^k.$

Define
$q_{\eta}(\delta) =\left(\frac{(1+\eta^2)}{2}\delta + \eta(1-\delta)\right)^k - 2^{1-k} \left(\eta(1-\delta) + \delta\right)^k + \left(\frac{\delta}{2}\right)^k$.
So we have (using Laplace’s method again)

$\mathbb{E}[X^2] = 2^n \sum_{\ell = 0}^n \binom{n}{\ell} q_\eta\left(\frac{\ell}{n}\right)^{\alpha n} \approx O(1) \cdot \exp(\max_{\delta \in [0,1]} \psi_\eta (\delta) \cdot n),$

For
$\psi_\eta(\delta) = H(\delta) + \ln 2 + \alpha \ln q_{\eta}(\delta).$

When $\delta = \frac{1}{2}$,

$q_{\eta}\left(\frac{1}{2}\right) = \left(\left(\frac{1+\eta}{2}\right)^k - \frac{1}{2^k}\right)^2,$

which is equal to the log of the square of the expectation, so again for the second moment method to succeed, $\delta = \frac{1}{2}$ must be the global maximum.

Guided by this consideration, we can set $\eta$ so that the derivative $q_{\eta}'(\frac{1}{2}) = 0$, so that $\psi_\eta$ achieves a local maximum at $\delta = \frac{1}{2}$. The choice for this (after doing some calculus) turns out to be the positive, real solution to the equation $(1+\eta)^{k-1}(1-\eta) = 1$. With this choice, one can show that the global maximum is indeed achieved at $\delta = \frac{1}{2}$ as long as $\alpha \textless 2^k \ln 2 - O(k)$. Below, we plot $\psi_{\eta}$ with this optimal choice of $\eta$ for several values of $\alpha$ at $k = 5$:

So we have bounded $\alpha_c \ge 2^k \ln 2 - O(k)$.

### Another phase transition

What is the correct answer for $\alpha_c$? We now have it in a window of size $O(k)$. Experimental data and heuristic predictions indicate that it is closer to $2^k \ln 2 - O(1)$ (and in fact for large $k$, Ding, Sly and Sun showed that $\alpha_c$ is a specific constant in the interval $[2^k \ln 2 -2, 2^k \ln 2]$). So why can’t we push the second moment method further?

It turns out that there is a good reason for this, having to do with another phase transition. In fact, we know that $k$-SAT has not only satisfiable and unsatisfiable phases, but also clustering and condensation phases:

In the clustering phase, there are exponentially many clusters of solutions, each containing exponentially many solutions, and each at hamming distance $\Omega(n)$ from each other. In the condensation phase, there are even fewer clusters, but solutions still exist. We can see evidence of this already in the way that the second moment method failed us. When $\alpha \textgreater 2^k \ln 2 - O(k)$, we see that the global maximum of the exponent is actually attained close to $\delta = 1$. This is because solutions with large overlap come to dominate the set of satisfying assignments.

One can also establish the existence of clusters using the method of moments. The trick is to compute the probability that two solutions, $x$ and $y$ with overlap $\delta n$, are both satisfying for $\phi$. In fact, we have already done this. From above,

$\Pr\left[\phi(x) \wedge \phi(y) ~\mid~ \frac{1}{2}\|x - y\|_1 = (1-\delta)n\right] = \left(1 - 2\cdot \frac{1}{2^k} + 2^{-k}\delta^k\right)^{\alpha n}.$

Now, by a union bound, an upper bound on the probability that there exists a pair of solutions $x,y$ with overlap $\delta n$ for any $\delta \in [\delta_1,\delta_2]$ is at most

$\Pr[\exists x,y \ s.t. \ \phi(x) \wedge \phi(y), \ \frac{1}{2}\|x-y\|_1 = (1-\delta)n \text{ for } \delta \in [\delta_1,\delta_2]] \le \sum_{\ell = \delta_1 n}^{\delta_2 n} \binom{n}{\ell} \left(1 - 2^{1-k} + \left(\frac{\ell}{2n}\right)^k\right)^{\alpha n}$
$\le \int_{\delta_1}^{\delta_2} \exp\left( \left(H(\delta) + \ln 2 + \alpha \ln \left(1-2^{1-k} + 2^{-k}\delta^k\right)\right)n\right) d\delta,$

and if the function $\beta(\delta) := H(\delta) + \ln 2 + \alpha\ln (1 + 2^{-k}\delta^k - 2^{1-k})$ is such that $\beta(\delta) \textless 0$ for all $\delta \in [\delta_1,\delta_2]$, then we conclude that the probability that there is a pair of satisfying assignments at distance between $(1-\delta_2)n$ and $(1-\delta_1)n$ is $o(1)$.

Achlioptas and Ricci-Tersenghi showed that for $k \ge 4$, $\alpha = (1-\eta)2^k$, and $\epsilon$ a fixed constant, the above function is less than $0$. Rather than doing the tedious calculus, we can verify by plotting for $k = 8, \alpha = 169$ (with $169 \textless \alpha_c$):

They also use the second moment method to show the clusters are non-empty, that there are exponentially many of them, each containing exponentially many solutions. This gives a proof of the existence of a clustering regime.

### Solution geometry and algorithms

This study of the space of solutions is referred to as solution geometry, and understanding solution geometry turns out to be essential in proving better bounds on the critical density.

Solution geometry is also intimately related to the success of local search algorithms such as belief propagation, and to heuristics for predicting phase transitions such as replica symmetry breaking.

These topics and more to follow, in the coming weeks.

## Resources

In preparing this blog post/lecture I leaned heavily on Chapters 2 and 10 of Marc Mézard and Andrea Montanari’s “Information, Physics, and Computation”, on Chapters 12,13,&14 of Cris Moore and Stephan Mertens’ “The nature of Computation,” and on Dimitris Achlioptas and Federico Ricci-Tersenghi’s manuscript “On the Solution-Space Geometry of Random CSPs”. I also consulted Wikipedia for some physics basics.

This is a kind reminder that the deadline for the early rate registration fees for FOCS 2018 is this Sunday, September 9, 2018.

===================================
FOCS 2018 – Second Call for Participation
===================================

https://www.irif.fr/~focs2018/

The 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2018) will take place in Paris, France, on 7-9 October 2018, with workshops and tutorials on October 6.

The program is now available, together with the list of the workshops/tutorials (as well as a list of a number of co-located events), on the conference webpage.

Early registration rate ends September 9, 2018.

All scientific and local information, and a link to the registration page, can be found at https://www.irif.fr/~focs2018/

Looking forward to seeing you in Paris !

(Thanks so much to Scott Aaronson for giving me many pointers, insights, explanations, and corrections that greatly improved this post. As I’m a beginner to physics, the standard caveat holds doubly here: Scott is by no means responsible to any of my remaining technical mistakes and philosophical misconceptions.)

One of the interesting features of physics is the prevalence of “thought experiments”, including Maxwell’s demon, Einstein’s Train, Schrödinger’s cat, and many more. One could think that these experiments are merely “verbal fluff” which obscures the “real math” but there is a reason that physicists return time and again to these types of mental exercises. In a nutshell, this is because while physicists use math to model reality, the mathematical model is not equal to reality.

For example, in the early days of quantum mechanics, several calculations of energy shifts seemed to give out infinite numbers. While initially this was viewed as a sign that something is deeply wrong with quantum mechanics, ultimately it turned out that these infinities canceled each other, as long as you only tried to compute observable quantities. One lesson that physicists drew from this is that while such mathematical inconsistencies may (and in this case quite possibly do) indicate some issue with a theory, they are not a reason to discard it. It is OK if a theory involves mathematical steps that do not make sense, as long as this does not lead to an observable paradox: i.e., an actual “thought experiment” with a nonsensical outcome.

A priori, this seems rather weird. An outsider impression of the enterprise of physics is that it is all about explaining the behavior of larger systems in terms of smaller parts. We explain materials by molecules, molecules by atoms, and atoms by elementary particles. Every term in our mathematical model is supposed to correspond to something “real” in the world.

However, with modern physics, and particular quantum mechanics, this connection breaks down. In quantum mechanics we model the state of the world using a vector (or “wave function”) but the destructiveness of quantum measurements tells us that we can never know all the coordinates of this vector. (This is also related to the so called “uncertainty principle”.) While physicists and philosophers can debate whether these wave functions “really exist”, their existence is not the reason why quantum mechanics is so successful. It is successful because these wave functions yield a mathematically simple model to predict observations. Hence we have moved from trying to explain bigger physical systems in terms of smaller physical systems to trying to explain complicated observations in terms of simpler mathematical models. (Indeed the focus has moved from “things” such as particles to concepts such as forces and symmetries as the most fundamental notions.) These simpler models do not necessarily correspond to any real physical entities that we’d ever be able to observe. Hence such models can in principle contain weird things such as infinite quantities, as long as these don’t mess up our predictions for actual observations.

Nevertheless, there are still real issues in physics that people have not been able to settle. In particular the so called “standard model” uses quantum mechanics to explain the strong force, the weak force, and the electromagnetic force, which dominate over short (i.e., subatomic) distances, but it does not incorporate the force of gravity. Gravity is explained by the theory of general relativity which is inconsistent with quantum mechanics but is predictive for phenomena over larger distances.

By and large physicists believe that quantum mechanics will form the basis for a unified theory, that will involve incorporating gravity into it by putting general relativity on quantum mechanical foundations. One of the most promising approaches in this direction is known as the AdS/CFT correspodence of Maldacena, which we describe briefly below.
Alas, in 2012, Almheiri, Marolf, Polchinski, and Sully (AMPS) gave a description of a mental experiment, known as a the “firewall paradox” that showed a significant issue with any quantum-mechanical description of gravity, including the Ads/CFT correspondence. Harlow and Hayden (see also chapter 6 of Aaronson’s notes and this overview of Susskind) proposed a way to resolve this paradox using computational complexity.

In this post I will briefly discuss these issues. Hopefully someone in Tselil’s and my upcoming seminar will present this in more detail and also write a blog post about it.

### The bulk/boundary correspondence

Edwin Abbott’s 1884 novel “Flatland”, describes a world in which people live in only two dimensions. At some point a sphere visits this world, and opens the eye of one of its inhabitants (the narrator, which is a square) to the fact its two-dimensional world was merely an illusion and the “real world” actually has more dimensions.

Cover of first edition of “Flatland”. Image from Wikipedia, *EC85 Ab264 884f, Houghton Library, Harvard University.

However, modern physics suggest that things might be the other way around: we might actually be living in flatland ourselves. That is, it might be that the true description of our world has one less spatial dimension than what we perceive. For example, though we think we live in three dimensions, perhaps we are merely shadows (or a “hologram”) of a two dimensional description of the world. One can ask how could this be? After all, if our world is “really” two dimensional, what happens when I climb the stairs in my house? The idea is that the geometry of the two-dimensional world is radically different, but it contains all the information that would allow to decode the state of our three dimensional world. You can imagine that when I climb the stairs in my house, my flatland analog goes from the first floor to the second floor in (some encoding of) the two-dimensional blueprint of my house. (Perhaps this lower-dimensional representation is the reason the Wachowskis called their movie “The Matrix” as opposed to “The Tensor”?)

The main idea is that in this “flat” description, gravity does not really exist and physics has a pure quantum mechanical description which is scale free in the sense that the theory is the same independently of distance. Gravity and our spacetime gemoetry emerge in our world via the projection from this lower dimensional space. (This projection is supposed to give rise to some kind of string theory.) As far as I can tell, at the moment physicists can only perform this projection (and even this at a rather heuristic level) under the assumption that our universe is contracting or in physics terminology an “anti de-Sitter (AdS) space”. This is the assumption that the geometry of the universe is hyperbolic and hence one can envision spacetime as being bounded in some finite area of space: some kind of a $d+1$ dimensional cylinder that has a $d$-dimensional boundary. The idea is that all the information on what’s going on in the inside or bulk of the cylinder is encoded in this boundary. One caveat is that our physical universe is actually expanding rather than contracting, but as the theory is hard enough to work out for a contracting space, at the moment they sensibly focus on this more tractable setting. Since the quantum mechanical theory on the boundary is scale free (and also rotation invariant) it is known as a Conformal Field Theory (CFT). Thus this one to one mapping of the boundary and the bulk is also known as the “AdS/CFT correspondence”.

If it is possible to carry over this description, in terms of information it would be possible to describe the universe in purely quantum mechanical terms. One can imagine that the universe starts at some quantum state $|x_0 \rangle$, and at each step in time progresses to the state $|x_{i+1} \rangle = U|x_i \rangle$ where $U$ is some unitary transformation.

In particular this means that information is never lost. However, black holes pose a conundrum to this view since they seem to swallow all information that enters them. Recall that the “escape velocity” of earth – the speed needed to escape the gravitational field and go to space – is about 25,000 mph or Mach 33. In a black hole the “escape velocity” is the speed of light which means that nothing, not even light, can escape it. More specifically, there is a certain region in spacetime which corresponds to the event horizon of a black hole. Once you are in this event horizon then you have passed the point of no return since even if you travel in the speed of light, you will not be able to escape. Though it might take a very long time, eventually you will perish in the black hole’s so called “singularity”.

New Yorker magazine, August 27, 2018

Entering the event horizon should not feel particularly special (a condition physicists colorfully refer to as “no drama”). Indeed, as far as I know, it is theoretically possible that 10 years from now a black hole would be created in our solar system with a radius larger than 100 light years. If this future event will happen, this means that we are already in a black hole event horizon even though we don’t know it.

The above seems to mean that information that enters the black hole is irrevocably lost, contradicting unitarity. However, physicists now believe that through a phenomenon known as Hawking radiation black holes might actually emit the information that was contained in them. That is, if the $n$ qubits that enter the event horizon are in the state $|x\rangle$ then (up to a unitary tranformation) the qubits that are emitted in the radiation would be in the state $|x \rangle$ as well, and hence no information is lost. Indeed, Hawking himself conceded the bet he made with Preskill on information loss.
Nevertheless, there is one fly in this ointment. If we drop an $n$ qubit state $|x\rangle$ in this black hole, then they are eventually radiated (in the same state, up to an invertible transformation), but the original $n$ qubits never come out. (It is a black hole after all.) Since we now have two copies of these qubits (one inside the black hole and one outside it), this seems to violate the famous “no cloning principle” of quantum mechanics which says that you can’t copy a qubit. Luckily however, this seemed to be one more of those cases where it is an issue with the math that could never effect an actual observer. The reason is that an observer inside the black hole event horizon can never come out, while an observer outside can never peer inside. Thus, even if the no cloning principle is violated in our mathematical model of the whole universe, no such violation would have been seen by either an outside or an inside observer. In fact, even if Alice – a brave observer outside the event horizon – obtained the state $|x\rangle$ of the Hawking radiation and then jumped with it into the event horizon so that she can see a violation of the no-cloning principle then it wouldn’t work. The reason is that by the time all the $n$ qubits are radiated, the black hole fully evaporates and inside the black hole the original qubits have already entered the singularity. Hence Alice would not be able to “catch the black hole in the act” of cloning qubits.
What AMPS noticed is that a more sophisticated (yet equally brave) observer could actually obtain a violation of quantum mechanics. The idea is the following. Alice will wait until almost all (say 99 percent) of the black hole evaporated, which means that at this point she can observe $0.99n$ of the qubits of the Hawking radiation $|R \rangle$, while there are still about $0.01n$ qubits inside the event horizon that have not yet reached the singularity. So far, this does not seem to be any violation of the no cloning principle, but it turns out that entanglement (which you can think of as the quantum analog of mutual information) plays a subtle role. Specifically, for information to be preserved the radiation will be in a highly entangled state, which means that in particular if we look at the qubit $|A \rangle$ that has just radiated from the event horizon then it will be highly entangled with the $0.99n$ qubits $|R \rangle$ we observed before.

On the other hand, from the continuity of spacetime, if we look at a qubit $|B\rangle$ that is just adjacent to $|A \rangle$ but inside the event horizon then it will be highly entangled with $|A \rangle$ as well. For our classical intuition, this seems to be fine: a $\{0,1\}$-valued random variable $A$ could have large (say at least $0.9$) mutual information with two distinct random variables $R$ and $B$. But quantum entanglement behaves differently: it satisfies a notion known as monogamy of entanglement, which implies that the sum of entanglement of a qubit $|A \rangle$ with two disjoint registers can be at most one. (Monogamy of entangelement is actually equivalent to the no cloning principle, see for example slide 14 here.)

Specifically, Alice could use a unitary transformation to “distill” from $|R\rangle$ a qubit $|C\rangle$ that is highly entangled with $|A\rangle$ and then jump with $|A\rangle$ and $|C\rangle$ into the event horizon to observe there a triple of qubits $(|A\rangle, |B\rangle, |C \rangle)$ which violates the monogamy of entanglement.

One potential solution to the AMPS paradox is to drop the assumption that spacetime is continuous at the event horizon. This would mean that there is a huge energy barrier (i.e., a “firewall”) at the event horizon. Alas, a huge wall of fire is as close as one can get to the definition of “drama” without involving Omarose Manigault Newman.

The “firewall paradox” is a matter of great debate among physicists. (For example after the AMPS paper came out, a “rapid response workshop” was organized for people to suggest possible solutions.) As mentioned above, Daniel Harlow and Patrick Hayden suggested a fascinating way to resolve this paradox. They observed that to actually run this experiment, Alice would have to apply a certain “entanglement distillation” unitary $D$ to the $0.99n$ qubits of the Hawking radiation. However, under reasonable complexity assumptions, computing $D$ would require an exponential number of quantum gates!. This means that by the time Alice is done with the computation, the black hole is likely to completely evaporate, and hence there would be nothing left to jump into!

The above is by no means the last word of this story. Other approaches for resolving this paradox have been put forward, as well as ways to poke holes in the Harlow-Hayden resolution. Nor is it the only appearance of complexity in the AdS/CFT correspondence or quantum gravity at large. Indeed, the whole approach places much more emphasis on the information content of the world as opposed to the more traditional view of spacetime as the fundamental “canvas” for our universe. Hence information and computation play a key role in understanding how our spacetime can emerge from the conformal picture.

In the fall seminar, we will learn more about these issues, and will report here as we do so.

Congratulations to Johan Håstad for winning the 2018 Knuth prize! Johan of course has done groundbreaking works from constructing pseudorandom generators based on one way functions, through his famous switching lemma, to his PCP theorem that continues to this day to be the blueprint for much of the work in hardness of approximation. A most deserving winner!

Johan will be presented with the award at the upcoming FOCS.

At the recommendation of Craig Gentry, I recently read the book “Factor Man” by Matt Ginsberg. This book is about a computer scientist that discovers an efficient algorithm for SAT, which starts off a international game of intrigue involving the FBI, NSA, Chinese spies, Swiss banks, and even some characters we know such as Steven Rudich and Scott Aaronson.
(However there is no mention of Scott’s role as the criminal mastermind behind the great Philadelphia Airport Heist.)

While it’s by no means “great literature”, Factor Man is a fun page-turner. I think it can be a particularly enjoyable read for computer scientists, as it might prompt you to come up with your own scenarios as to how things would play out if someone discovers such an algorithm. Unsurprisingly, the book is not technically perfect. The technical error that annoyed me the most was that the protagonist demonstrates his algorithm by factoring integers of sizes that can in fact be fairly easily factored today (the book refers to factoring 128 or 256 bit numbers as impressive, while 768 bit integers of general form have been factored, see also this page and this paper). If you just imagine that when the book says “$n$ bit” numbers it actually means $n$ byte numbers then this is fine. Network security researchers might also take issue with other points in the book (such as the ability of the protagonist to use gmail and blogspot without being identified by neither the NSA nor Google, as well as using a SAT algorithm to provide a “final security patch” for a product).

Regardless of these technical issues, I recommend reading this book if you’re the type of person that enjoys both computer science and spy thrillers, and I do plan to mention it to students taking my introduction to theoretical CS  course.

There is something cool about physics. Black holes, anti-matter, “God’s particle”: it all sounds so exciting. While our TCS “mental experiments” typically involve restricting the inputs of constant-depth circuits, physicists talk about jumping into black holes while holding a dictionary. Physicists also have a knack for names: notions such as “uncertainty principle” or “monogamy of entanglement” sound so much cooler than “Cauchy-Schwarz Inequality” or “Distributive Law”.

But a deeper reason to envy physicists is that (with certain important exceptions) they often have fairly good intuitions into how their systems of interest behave, even if they can’t always prove them. In contrast, we theoretical computer scientists are more often than not completely “in the dark”. A $t$-step algorithm to compute the mapping $x \mapsto f(x)$ can be modeled as $t$ applications of some simple local update rule to the initial state $x$. Studying the evolution of systems is the bread-and-butter of physics, but many physical intuitions fail for the progression of an algorithm’s computation. For example, for general algorithms, we do not have a natural sense in which the state of the system after $0.9t$ steps is “closer” to $f(x)$ than it is to $x$. The intermediate state of a general algorithm is rather non informative. Similarly, we do not have a nice, even conjectural, way to characterize the set of functions that can be computed by $t$ steps: the lack of such clean “complexity measures” is strongly related to the natural proofs barrier for proving circuit lower bounds.

But there are some algorithms for which better “physical intuition” exists. Many optimization algorithms have a “potential function” that improves at every step, and other algorithms such as Monte-Carlo Markov-Chain sampling, are inspired by and can be analyzed using physics intuition. These connections have been recently explored, resulting in both new algorithms as well as better understanding of algorithmic techniques for optimization and learning, as well as the regimes in which they apply.
There are also cases where the computer-science intuition can help in analyzing physical systems. Quantum computers are of course one example, but apparently there are other interesting physical systems (maybe even black holes? see also this summer school) which are “disordered” enough that the best way of thinking of them might be to treat them as random circuits of certain complexity. More generally, in recent years physicists have began to view information and computation as an increasingly useful lens through which to understand physics. The “it from qubit” perspective, whereby spacetime emerges from information rather than the other way around, is growing in popularity.
Finally, on a more “meta” level, the task of doing theoretical physics itself can be thought of as a computational problem. In a fascinating talk, physicist Nima Arkani-Hamed discusses the problem of finding a theory of physics as essentially solving an optimization problem in the “space of ideas”. Specifically, it is a non convex problem, and so a local optimum is not necessarily a global one. Arkani-Hamed calls classical physics “the top of a local mountain in the space of ideas”, i.e., a local optimum, while quantum mechanics is the top of a “taller mountain”. The reason it was hard to make the leap to quantum mechanics is exactly because they “they’re not smoothly connected”.

By classical physics being a local optimum, we mean that if you try to “tweak” classical physics by turning (in his words) “knobs, and little wheels, and twiddles”, you will only get a theory that is less beautiful and with less explanatory power. To get to the better theory of quantum mechanics, one needs to make a conceptual jump, rather than a series of small tweaks. Just like classical physics, quantum mechanics itself is a local optimum, for which every small “tweak” will only make it less beautiful and predictive. This is one explanation as to why it has been so difficult to find the grander theory that unifies general relativity and quantum mechanics. As Harkani-Hamed says, to find such a theory “there’s going to have to be a jump of a comparable magnitude, in the jump that people have to make in going from classical to quantum”. In a related point to “it from qubit”, he also says that “many, many of us suspect that the notion of spacetime can’t be fundamental and it has to be replaced by something else.”
Interestingly, in certain settings convex optimization can be applied to explore the “space of ideas”. In particular some works on the “bootstrap method” use semidefinite programming to explore the space of quantum field theories that satisfy certain symmetries. It turns out that sometimes the constraints that one can derive from these symmetries are so powerful, that they completely determine the theory.

### A fall seminar

The bottom line is that we’re seeing a more and more interesting exchange of ideas between physics and theoretical computer science. As I’m sure I already demonstrated in some cringe-worthy statements above, I know very little about this interface, but am interested in finding out more.

So, Tselil Schramm and I will be running a graduate seminar this fall. We will be learning together with the seminar participants about some of these connections, and hopefully by the end of the term we will all be a little less ignorant.

Some of the topics we will discuss include:

• Connections between statistical physics and algorithms, understanding the physics predictions for hard and easy regimes via phase transitions.
• Quantum information theory: quantum-inspired classical results, as well as classical algorithms for quantum problems.
• The conformal bootstrap: exploring the space of possible physical theories using semidefinite programming.
• Black holes, bulk/boundary correspondence, and computational complexity.
• Quantum superiority – understanding the current proposals for demonstrating exponential speedups for quantum computers, and the evidence for their classical difficulty.
• Quantum Hamiltonian Complexity – the quantum analog of constraint satisfaction problems, with questions such as the existence of a “Quantum PCP Theorem”.

Each one of these is probably worth a semester course on its own, and is typically presented to people with significant physics background. But we are hoping we can create a “tasting menu” and manage to take away some insights and ideas from each of those areas, even if we can’t cover the whole ground.

Participants in the seminar will not only present papers or surveys, but also write a blog post about them, which I will post here, so stay tuned for more information.

[Unrelated note: Huge congratulations to Costis Daskalakis – winner of the 2018 Nevanlinna medal!]

As part of the CRYPTO 2018 conference (August 19-23, Santa Barbara, CA), there is a set of of affiliated events. The conference organizers (Tal Rabin, Elette Boyle, and Fabrice Benhamouda)  asked me  to advertise the workshop Beyond Crypto: A TCS Perspective (itself organized by Yuval Ishai and Guy Rothblum) on August 19th that can be of particular interest to theoretical computer scientists.

Among the speakers will be:

• Aleksander Madry (MIT) will talk about “Machine Learning and Security: The Good, the Bad, and the Hopeful”
• Cynthia Dwork (Harvard) will talk about “Theory for Society: Crypto on Steroids”.
• Virginia Vassilevska Williams (MIT) will talk about “A Fine-Grained Approach to Complexity”
• Mary Wootters (Stanford) will talk about “Cryptography, Local Decoding, and Distributed Storage”
• Scott Aaronson (UT Austin) will talk about “Certified Randomness from Quantum Supremacy”.

(I am also speaking in this workshop, and my talk is titled “On Optimal Algorithms and Assumption Factories”).