### Outline

Sampling from thermal states was one of the first and (initially) most important uses of computers. In this blog post, we will discuss both classical and quantum Gibbs distributions, also known as thermal equilibrium states. We will then discuss Markov chains that have Gibbs distributions as stationary distributions. This leads into a discussion of the equivalence of mixing in time (i.e. the Markov chain quickly equilibrates over time) and mixing in space (i.e. sites that are far apart have small correlation). For the classical case, this equivalence is known. After discussing what is known classically, we will discuss difficulties that arise in the quantum case, including (approximate) Quantum Markov states and the equivalence of mixing in the quantum case.

# Gibbs distributions

We have already learned about phase transitions in a previous blog post, but they are important, so we will review them again. The **Gibbs** or **thermal distribution** is defined as follows: Suppose that we have an **energy function** , which takes -bit strings to real numbers. Usually, , where each term depends only on a few bits. For example, the energy might be the number of unsatisfied clauses in a 3-SAT formula, or it may arise from the Ising model. The Gibbs distribution is

where the normalization factor in the denominator, also called the **partition function**, is . Another, perhaps more operational, way to define the Gibbs distribution is:

In this expression, is the set of probability distributions on , is the Shannon entropy, and is a constant representing the average energy. We are thinking of probability distributions and as vectors of size . It turns out that if we solve this optimization problem, then the Gibbs distribution is the unique solution.

## Uses of Gibbs distributions

Why is it useful to work with Gibbs distributions?

Gibbs distributions arise naturally in statistical physics systems, such as constraint satisfaction problems (CSPs), the Ising model, and spin glasses. One approach to deal with Gibbs distributions is through belief propagation (BP), which yields exact inference on tree graphical models and sometimes phase transition predictions on loopy graphs. Instead, we will focus on a different approach, namely,

*sampling*from the Gibbs distribution.If we want to minimize (say, to find a 3-SAT solution), we can use

**simulated annealing**. The idea of annealing is that we want to produce a crystal; a crystal is the lowest energy configuration of molecules. If we heat up the substance to a liquid and then cool it quickly, we will not get a nice crystal, because little bits of the material will point in different directions. In order to form a crystal, we need to cool the system slowly.In computer science terms, we take a sample from a high temperature because sampling is generally easier at a higher temperature than at a lower temperature. We then use that sample as the starting point for an equilibration process at a slightly lower temperature, and repeat this procedure. If we reach zero temperature, then we are sampling from the minimizers of . In practice, the system will usually stop mixing before we get to zero temperature, but this is a good heuristic. You can think of this process as gradient descent, with some additional randomness.

Gibbs distributions are used to simulate physical systems.

Gibbs distributions are used in Bayesian inference due to the Hammersley-Clifford theorem, which will be discussed next.

Gibbs distributions are also connected to multiplicative weights for linear programming (not discussed in this blog post).

## Bayesian inference & the Hammersley-Clifford theorem

In order to present the Hammersley-Clifford theorem, we must first discuss Markov networks. For this part, we will generalize our setup to a finite alphabet , so the energy function is now a function .

### Markov chains

First, let us recall the idea of a **Markov chain** with variables , , .