Skip to content

By Boaz Barak and Jarosław Błasiok

[Jarek gave a great informal exposition of this paper on Friday, and I asked him to make it into a blog post. My only knowledge of the paper is from Jarek’s explanations: my main contribution to this post was to delete some details and insert some inaccuracies –Boaz.]

A recent exciting paper of Ran Raz and Avishay Tal solved a longstanding problem in quantum complexity theory: the existence of an oracle under which the class $\mathbf{BQP}$ is not contained in the polynomial hierarchy. The proof is elegant and actually not extremely complicated, and in this blog post we discuss the main idea behind it. For some of the motivation and background of this problem, see the posts of Scott Aaronson and Lance Fortnow. As is actually often the case in such relativization result, the proof does not have much to do with either $\mathbf{BQP}$ or the polynomial hierarchy. Rather at its heart is a new lower bound against the perennial “victim” of circuit lower bounds: the class $\mathbf{AC}_0$ of constant depth circuits with AND, OR and NOT gates.

Rather than thinking about a single choice of an oracle, it turns out that it’s better to think of a distribution over oracles, and (for notational convenience) as a distribution $D$ over pairs of oracles $X,Y:\{0,1\}^n \rightarrow \{ \pm 1 \}$.
They show that:

1. There is a quantum algorithm that can make one quantum query to each of these oracles, and using this distinguish with advantage at least $\epsilon = 1/poly(n)$ between the case that $X$ and $Y$ are sampled from $D$, and the case that $X$ and $Y$ are sampled from the uniform distribution. (Set $\epsilon = 1/n$ for concreteness.)
2. For every constant depth polynomial size circuit $C$ on $2N$ inputs (where $N = 2^n$), the probability that $C$ outputs $1$ on $(X,Y)$ sampled from $D$ differs by at most $\tilde{O}(\tfrac{1}{\sqrt{N}}))= \tilde{O}(2^{-n/2})$ from the probability it outputs $1$ on $(X,Y)$ sampled from the uniform distribution.

Together 1 and 2 imply the main results by standard arguments.

## The distribution $D$

The distribution $D$ they use is (a variant of) the “Forrelation” distribution proposed for this task by Aaronson. Namely, they choose $Y$ to be $\epsilon$ correlated with the Fourier transform of $X$.

Specifically however, it is better to think about this distribution as follows:

• Sample $(X,Y)$ as correlated Gaussian random variables over $\mathbb{R}^{2N}$ with the covariance matrix $\begin{pmatrix} I & H \\ H^{-1} & I \end{pmatrix}$ where $H$ is the $N\times N$ Hadamard matrix defined as $H_{x,y} = \tfrac{1}{\sqrt{N}}(-1)^{\langle x,y \rangle}$ for every $x,y \in \{0,1\}^n$. Another way to think about this is that $X$ is a standard normal, and $Y=HX$. Note that for any $i, j$, $\mathbb{E} X_i Y_j = \pm \frac{1}{\sqrt{N}}$.
• Then let $D$ be the distribution over $\{ \pm 1 \}^{2N}$ that has the expectation $(\epsilon X,\epsilon Y)$. That is, for $i,j \in \{1,\ldots,N\}$, we choose $D_i$ as a $\pm 1$ random variable with expectation $\epsilon X_i$ and $D_{N+j}$ as a $\pm 1$ random variable with expectation $\epsilon Y_j$. If $|X_i|$ or $|Y_j|$ are larger than $1/\epsilon$ and each coordinate is a standard Normal then we truncate them appropriately, but because we set $\epsilon=1/n$ this will only happen with probability $2^{-\Omega(n^2)}$ which we can treat as negligible and ignore this event from now on.

## Distinguishing $D$ from uniform is easy for quantum algorithms

The heart of the proof is in the classical part, and so we will merely sketch why this problem is easy for quantum algorithms. A Quantum Algorithm that gets query access to $X$ and $Y$, which we think of as functions mapping $\alpha\in \{0,1\}^n$ to $\{ \pm 1\}$ can obtain the state

$\sum_{\alpha} X(\alpha) |0\rangle |\alpha \rangle + \sum_{\alpha} Y(\alpha) |1\rangle |\alpha \rangle$

The algorithm can then perform the Quantum Fourier Transform (in this case applying Hadamard to its qubits $2..n+1$) to change this to the state

$\sum_{\alpha} \hat{X}(\alpha) |0\rangle |\alpha \rangle + \sum_{\alpha} Y(\alpha) |1\rangle |\alpha \rangle$

But in our case, $Y$ will be $\epsilon$ correlated with $\hat{X}$ and so this can be shown to have the form

$\epsilon (|0\rangle + |1\rangle)\rho + \rho'$

where $\rho$ is some state on the last $n$ qubits.

If we now perform the Hadamard operation on the first qubit and measure it, we will be about $\epsilon$ more likely to see a zero than to see a 1. On the other hand it is not hard to show that if the input was uniform then we will see zero and one with equal probability. The advantage of $\epsilon$ can be amplified by making more queries or considering “tensored” forms of the same oracles.

## $D$ is pseudorandom for AC0 circuits

We now come to the heart of the proof: showing that it is hard to distinguish $D$ from the uniform distribution for $AC_0$ circuits. One of the challenges of obtaining such an oracle separation is that the standard approach of showing such a result, which is to show that $D$ fools low degree polynomials, is not available to us. Indeed, our quantum algorithm itself is a degree 2 polyonomial in $X$ and $Y$!

At a high level, the proof shows that AC0 circuits can be approximated (to a certain extent) by very special kind of low degree polynomials – ones that are sparse in the sense that the L1 norm of their coefficients is not much larger than the L2 norm.

Anyway, we are getting ahead of ourselves. Let $f:\{ \pm 1 \}^{2N} \rightarrow \{ \pm 1 \}$ be computable by a polynomial size constant depth $AC_0$ circuit. Our goal is to show the following result:

Main Theorem: $|\mathbb{E} f(D) - \mathbb{E} f(U) | \leq \tilde{O}(\frac{1}{\sqrt{N}})$

We will identify such a function $f$ with its multilinear extention, and hence think of $f$ as the polynomial $f(Z) = \sum_{S \subseteq [2N]} \hat{f}(S)\mathbb{E} \prod_{i\in S}Z_i$. Note that in this notation, $\mathbb{E} f(U) = \hat{f}(\emptyset) = f(0)$.

We start with the following simple observation: for every multilinear map $f:\mathbb{R}^{2N} \rightarrow \mathbb{R}$, and $(X,Y) \in \mathbb{R}^{2N}$, if we sample from a product distribution $D$ where the expectation of each coordinate matches the coordinate in $(X,Y)$, then $\mathbb{E} f(D) = f(X,Y)$. This means that for the purposes of analyzing the main theorem, we could just as well imagine that $D$ is sampled from the normal multivariate distribution $Z$ over $\mathbb{R}^{2N}$ with correlation matrix $\epsilon^2 \begin{pmatrix} I & H \\ H^{-1} & I \end{pmatrix}$.

Hence the result will follow from the following result:

Main Lemma Let $Z$ be any multivariate Gaussian distribution over $\mathbb{R}^M$ such that $\mathbb{E} Z_i^2 = o( \log^{-1/2} (M/\delta))$ and $|\mathbb{E} Z_i Z_j | \leq \delta$ for $i\neq j$. Then $Z$ is $\tilde{O}(\delta)$ pseudorandom w.r.t. multilinear polynomials computable by polynomial sized circuits of constant depth.

The $\tilde{O}$ notation hides factors poly-logarithmic in $M$. In our case, the distribution $Z$ satisfies $\mathbb{E} Z_i^2 = \epsilon^2 = 1/n$, and $|\mathbb{E}Z_iZ_j| \leq \tfrac{1}{\sqrt{N}}$ and so we get that $Z$ (and hence our original $D$) is $\tilde{O}(\tfrac{1}{\sqrt{N}})$ pseudorandom, completing the proof of the main theorem.

Note that the Main Lemma is of interest in its own right – it says that for a multivariate Gaussian to fool AC0, it is enough that the second moments sufficiently decay as long as the diagonal entries are moderately bounded away from 1. (However, for multivariate Gaussians, a bound on the second moments also implies decay of higher moments by Isserlis’ theorem.)

### Proof of the Main Lemma

The Main Lemma is proved by a “Hybrid Argument”. (Related to arguments used by Chattopadhyay, Hatami, Hosseini and Lovett for constructing pseudorandom generators.) We will set $T = (M/\delta)^{10}$. Since a sum of Gaussians is a Gaussian, for every multivariate Gaussian $Z$, we can write

$Z = \tfrac{1}{\sqrt{T}}\sum_{i=1}^T Z^i$

where $Z^i$ is identically distributed to $Z$.

We write $Z^{\leq i} = \tfrac{1}{\sqrt{T}}\sum_{j=1}^i Z^j$. To prove the Main Lemma we need to show that $|\mathbb{E}f(Z^{\leq T}) - f(0)| \leq \tilde{O}(\delta)$ for every constant-depth poly-size computable multilinear $f$. To do so we will establish the following inequality:

$| \mathbb{E} f(Z^{\leq i}+\tfrac{1}{\sqrt{T}}Z^{i+1}) - \mathbb{E} f(Z^{\leq i}) | \leq \tilde{O}(\tfrac{\delta}{T}) \;\;\;\;(*)$
for every $i=0,1,\ldots,T-1$. (Writing $Z^{\leq 0}$ for the $0$ vector.)
We can obtain the main lemma by summing up $(*)$ over all $i=0,\ldots, T-1$.

The main tool we will use to establish $(*)$ is the following structural result on AC0, which is again of independent interest:

L1 bound theorem: Let $f:\mathbb{R}^M \rightarrow \mathbb{R}$ be a function computable by an $\mathbf{AC}_0$ circuit of quasipolynomial size and constant depth, and let $z_0 \in [-1/2,+1/2]^M$ and $\delta<1/100$. Then the function $g:\mathbb{R}^M \rightarrow \mathbb{R}$ defined as $g(z) = f(\delta z+z_0)-f(z_0)$ satisfies that
$\sum_{U \subseteq [M], |U|=k} |\hat{g}(U)| \leq (2 \delta)^k (\log N)^{O(k)}$ for every $k \geq 1$.

For $z_0 = 0$ those bounds on Fourier coefficients were proven by Avishay Tal. It turns out that general case can be essentially reduced to this case by a clever random restriction.

The L1 bound theorem implies $(*)$, by writing $g(z)=f(\tfrac{1}{\sqrt{T}}z+z_0)-f(z_0)$ for $z_0$ sampled from $Z^{\leq i}$. (The event $z_0 \in [-1/2,+1/2]^M$ holds with high probability. It can be shown that conditioning on this event changes the relevant expectations in $(*)$ by at most an additive term that is proportional to the failure probability, which is smaller than $\frac{\delta}{T}$.)

We can then bound the lefthand side of $(*)$, by

$\sum_{k \geq 1}\sum_{U \subseteq [M], |U|=k} |\hat{g}(U)| \mathbb{E} \prod_{i\in U}Z_i \leq \sum_{k \geq 1}(\tfrac{2}{\sqrt{T}})^k (\log N)^{O(k)} \mathbb{E} \prod_{i\in U}Z_i$
the moment for $k=1$ vanishes. In the moments arising from $k\geq 3$, the term $(\tfrac{1}{\sqrt{T}})^k$ makes the contribution negligible, even when we sum it over $T$ terms — by Isserlis’ theorem  we mentioned above we can bound $|\mathbb{E} \prod_{i \in U} Z_i| \leq (\delta k)^k \leq T^{0.1k}$. The only interesting term is when $k=2$ where we get exactly $\tilde{O}(\delta/T)$ as desired. (As an aside, Isserlis’ Theorem gives a way to compute the moments of any multivariate Gaussian, and has a beautiful two line proof using Taylor expansion of $\mathbb{E} \exp(\sum \lambda_i Z_i) = \exp(0.5 \mathbb{E} (\sum \lambda_i Z_i)^2)$ in the formal variables $\lambda_1,\ldots,\lambda_M$.)

### Proof of the L1 bound theorem

We now show the proof of the L1 bound theorem. We say that function $h$ is a restriction of $f$, if $h(z) = f(r(z))$ where $r(z)_i \in \{1, -1, z_i\}$.

Claim: For every fixed $z_0 \in [-1/2, 1/2]^M$ and multilinear $f$ there exist a random $h$ which is restriction of $f$, such that
$g(z) = f(\delta z + z_0) = \mathbb{E}_{h} h(2 \delta z).$

Indeed, for given $z_0$, take a variable $\tilde{z} \in \mathbb{R}^M$, such that on each coordinate independently we have $\tilde{z}_i = 2\delta z_i$ with probability $\frac{1}{2}$, $\tilde{z_i} = 1$ with probability $(z_0)_i/2 + \frac{1}{4}$, and $\tilde{z}_i = -1$ otherwise. Note that $\mathbb{E} \tilde{z} = \delta z + z_0$. Since $f$ is multilinear, and $\tilde{z}$ is independent across coordinates, we have
$g(z) = f(\delta z + z_0) = f(\mathbb{E} \tilde{z}) = \mathbb{E} f(\tilde{z}) = \mathbb{E} h(2 \delta z),$
where $h$ depending on realization of $\tilde{z}$ is a corresponding restriction of $f$ — it fixes coordinates where $\tilde{z}_i \in \{\pm 1\}$ accordingly.

Now the L1 bound theorem follows by linearity of Fourier transform: for specific $U$ we have $\hat{g}(U) = \mathbb{E}_h (2\delta)^{|U|} \hat{h}(U)$. Every realization of $h$ as a restriction of an $\mathbf{AC}_0$ circuit is again an $\mathbf{AC}_0$ circuit (both size and depth can only decrease), hence we can apply the bounds on Fourier coeffiecients of $\mathbf{AC}_0$ circuits to $h$, namely $\sum_{|U| = k} |\hat{h}(U)| \leq (\log N)^{O(k)}$. Plugging those together we get
$\sum_{U \subset [M], |U| = k} |\hat{g}(U)| \leq \mathbb{E}_h \sum_{|U| = k} |\hat{h}(U)| (2\delta)^k \leq (2\delta)^k (\log N)^{O(k)}.$

which is what we needed to prove.

A couple of years ago I posted about Theory Life Hacks. “Life” might be a bit too grand, but what I really mean are technological tools that can help in the day to day work. In the last two years, perhaps because of taking more responsibilities, I found myself using more technology and so I thought  thought I’d post what I’m using these days, and most importantly hope to hear from readers.

## Running courses

I am now teaching a large course at Harvard with a staff of about 18 people. I use slack for communication with the staff, Piazza for communication with the students, Gradescope for submitting student work and grading exams, while the course data is kept in a private GitHub repository (see below).

## Markdown

These days I use markdown for almost everything. For me it is basically plaintext where I can also write  math between dollar signs. I use markdown to write my all my lecture notes including my 500+ pages intro to TCS book. I use it to write short technical notes and emails: in particular the Markdown Here extension  lets me write math in gmail, and Hackmd is a good way to share short notes. Finally, I am also using it to produce presentations for my courses. (For one-off presentations such as  talks, I still prefer PowerPoint.)

For much of these I use pandoc to transform markdown into various formats, including latex/pdf, html, and html slides. I still use Atom as my main editor, and it’s only become better over time. I use the sync-setting package to synchronize settings between machines and autosave to avoid losing my work. It took me forever to figure out how to enable spell checking using the spell-check package on Windows. The trick  is to enable the SPELLCHECKER_PREFER_HUNSPELL environment variable.

## Text extenders and clipboard managers

Two apps that save me a ton of time are Breevy (a text expander) and Ditto (a clipboard manager). A clipboard manager simply remembers all the things you copied. For example, if you are filling out a recommendation or review form and need to copy stuff from your text file, you can copy all the fields first and then paste them one by one. A text expander allows  you to expand short pieces of text into longer ones. For example, these days I am co director of undergraduate studies at Harvard and find often the need to write the same email again and again. For example, if I type :dusadmit in any window, it automatically expands to

Dear X,

Thanks for your email and your interest in Harvard. For information about admission to the Harvard undergraduate program, please see the admissions website,  https://college.harvard.edu/admissions, which has information about the application process and requirements. Some additional information for international students is at  https://college.harvard.edu/admissions/application-process/international-applicants.

Information for prospective undergrads interested in the School Engineering and Applied Sciences (SEAS) is at  https://www.seas.harvard.edu/audiences/prospective-undergraduates, and some information about Computer Science is at  https://www.seas.harvard.edu/programs/computer-science and  https://cs50.harvard.edu/guide.

Please note that the Computer Science Directors of Undergraduate Studies are not directly involved in the Harvard admissions process, and so questions about admissions should be directed to the admissions office.

Best and good luck,

Boaz Barak

Gordon McKay Professor of Computer Science
John A. Paulson School of Engineering and Applied Sciences
Harvard University

The text expander even stops to ask me for the name so X above is replaced with it. This way I answer with pertinent and helpful information without spending all my time typing. For mass personalized emails, I use the Mail Merge for Gmail add on.

I also use the text expander for various other goals. For example, no matter in which program I am, if I type \bits it always expands to \{0,1\}.  (Indeed I find myself using less and less custom latex commands these days.) I also use it to automate my workflow with git, which is still my favorite version control system. If I type :gitsub it gets expanded to git submodule foreach 'git add -A ; git commit -m wip ; git pull --rebase ; git push'; git add -A ; git commit -m wip ; git pull --rebase ; git push which basically updates my repository in full. I use it also to remember folder names: if I type :cs121 then it expands to C:\DBOX\WORK\Courses\CS121\

## Version control, websites

As mentioned above, I use git for keeping track of papers, courses, and many other projects. You can host git repositories on GitHub, Bit Bucket, or Gitlab, and there are other services as well. I like to put my git repositories inside my dropbox folder even though you are not supposed to do that. The way I handle it is that I have for each project two subfolders Laptop and Desktop that contain two copies of the same repository. I only use Laptop on my laptop  and Desktop on my desktop and so I never touch the same folder from two computers. So far it seems to work.

Netlify is a great service that allows you to keep a website updated from a repository. It works well together with Hugo, which  is a website generator that allows you to write your website using.. you guessed it .. markdown.

## Windows specific tools

These days I find myself in the command line a lot for compiling markdown, pushing things to git etc.., cmder is a nice console app for Windows. Sumatra is still my favorite pdf viewer. For drawing quick doodles using the surface pen, the sketchpad app that comes with Windows (“Windows Ink Workspace”) is actually quite nice.  I sometimes need to write some code snippets – anaconda and jupyter lab are really nice (and not windows specific). And of course I use MikTex.

Anyway, that’s it about my technology use..  would be curious to learn what other people are using to save time and effort.

This spring I taught Cryptography at Harvard (as usual my lecture notes are online ).

Teaching it was a great fun because of the fantastic group of students that took the course.  They were genuinely interested in cryptography, and kept asking me extremely interesting questions and had excellent insights. Thanks also to Yael Kalai and Bruce Schneier for giving guest lectures in this course.

Though I haven’t done so in a while, I decided to  do course projects this term. I was very impressed on what these students (most of whom are undergraduates) managed to achieve in  one term. So, since a blog is not much use if you can’t show off your students,  without further ado here are the projects in the course:

• A Cryptanalysis of IOTA’s Curl Hash Function / Michael Colavita and Garrett Tanzer. The IOTA crypto currency initially used a tailor made hash function known as “Curl” or “Curl-P”. Unfortunately, a collision was found by Heilman et al. However, Heilman did not make their attack code available. In this project, Colavita and Tanzer managed to reproduce their result, and provide a complete description, as well as online available code for the collision-finding algorithm. Even if you are not interested in IOTA, reading this project is a great way to get familiar with some cryptanalytic techniques. Like any good cryptanalytic work, they even got flamed on Reddit.

• DeCERT: A Decentralized Certificate Authority / Leo Hentschker. This project describes what is a novel (to me at least) approach to deal with the issue of trust in certificates: decentralize the decision of what certificate to trust by voting on a block chain. Leo built the website http://decert.io/ where one can issue a certificate as well as voting on it using tokens based on the Ethereum currency. This was much more than I could have expected from someone to do as a course project!

• OpAwesome: The Good, the Bad, and the Fuzzy in the Secure Database Landscape: Sophie Hilgard and Wilson Qin. The project does a thorough exploration of the design space for encrypted data bases. It seems that every few months we hear of another break where millions of people’s personal information was compromised, and a theoretical cryptographer might wonder why don’t people simply encrypt all the databases with fully homomorphic encryption? Of course, even I know that this is not yet realistic, and generally there are tradeoffs between the levels of security and efficiency (and not less important, backward compatibility with legacy systems). The project covers the various solutions in this space, including some that are relatively easy to use as a “plug in replacement” for non encrypted data bases, but whose security is also limited, and others that have stronger notions of security. They also discuss their preliminary work on opAwsome: a secure database that allows for smoother tradeoffs between communication and client-side computation on the one hand, and security on the other hand.

• Fully Homomorphic Image Processing in the Cloud: William Fu, Raymond Lin, Daniel Inge. Fully homomorphic encryption is notoriously ineffieicnt, but partially homomorphic encryption might sometimes be good enough. This project implements homomorphic compression of images. This can be useful for example to run a machine learning classifier on an image (such classifiers often work on compressed images). I was surpised this can actually be done, but they even have code (using Microsoft’s SEAL library) available on this repository.

• Introduction to Encrypted Voting Protocols / Eric Bornstein: Voting underlies many of the most important decisions we as a society have to undertake. The importance of the security of (and public confidence in) voting procedures cannot be overstated, and yet many current implementations leave something to be desired. This comprehensive survey discusses the differnt goals we wish voting protocols to achieve, including accuracy, privacy, and freedom from coercion, and various cryptographic, as well as paper based (and combinations thereof) ways to achieve them. In particular it covers the ideas of Mix Nets (for anonymity), Visual Cryptography (for usablity) , and Vote/Anti-vote pairs (for receipt freeness). If you have any interest in voting, you’d probably want to take a look.

• Proof of sequential work / Meena Jagadeesan, Alec Sun, and Alex Wei: Proofs of work have recently been of great interest to cryptographers, as they underlie the Bitcoin blockchain. However, the type of proofs used by Bitcoin and its ilk are inherently parallelizable, which gives some limits to their usability. For example, one cannot use them to measure elapsed time- only total cost. Proofs of sequential work adress this by providing a “puzzle” that can only be solved by $N$ sequential steps. The project surveys some of the older constructions as well as the very recent one of Cohen and Pietrzak, and discusses ideas for addressing some of the open problems in this area.

• Analysis of vulnerabilities and mitigations in Multiparty Computation / Leor Fishman: This project analyzed some subtle but important issues in multiparty secure computation. One of the interesting challenges in the area of multiparty computation is the idea of covert computation: how can you hide not just what is being computed but even the mere fact that something is being computed? The project considers a strong (but well motivated!) notion of security where even some of the parties to the protocol would not know the identity of others, and shows some negative results. It then also studies issues of communication patterns in MPC, and in particular protocols that divide the communications into smaller groups or “quorums”. The project shows that the effect of such optimization on security is stronger than what one might have thought initially.

• A secure zero knowledge two factor authentication protocol / Albert Chalom, Nathan Wolfe: This project studies how one achieves the intersection of several goals in authentication: handling fuzzy (and potentially low entropy) authentication data, such as fingerprint etc.., but while also making sure that (as is done with passwords) this data is not stored in the clear in the server. They show a “proof of concept” using two party secure computation.

• Shor’s Algorithm: Andrei Laurentiu Ciupan. Shor’s algorithm is of course a fundamental result that has energized much research on both quantum computing and cryptography over the last two decades. This project is a short (8 page) but almost complete (except for some results on continued fractions) description of the algorithm and its analysis.

• Query and Communication Lower Bounds for Key-Exchange Protocols: a Survey / Noah Golowich: This project touches a topic that’s dear to my heart – understanding the possiblity of implementing key exchange based only on an idealized random oracle, without making hardness of assumptions on structured problems such as lattices, factoring, etc.. It surveys some of this highly technical literature, and mentions a very recent result on communication tradeoffs and some open problems.

• A survey on quantum secure cryptographic systems / Tomoka Kan: This projects the landscape of cryptographic schemes whose security was analyzed in the setting where not only computation but also communication is quantum. This setting allows the adversary some extra powers, as they can make “super position queries” to the other parties, and the issue of security is quite subtle and interesting.

As I mentioned before, I am teaching CS 121  at Harvard, and have written my own text, with the (not very original) title “Introduction to Theoretical Computer Science” . I am hoping for this text to turn into a published textbook in the next year or two. Toward this end, I would be grateful for any comments or suggestions on the text, as I will be working on it over the summer.

You can use the issues page on the GitHub repository to make any suggestions, corrections,  reports on bugs and typos, and so on.

I also hope that others instructors would consider using this text in their courses, and would welcome suggestions as to what I can do to help in that. In particular, over the summer I plan to add more exercises (including some solved ones) and more examples, as well as more explanations and “proof ideas” for many of the theorems.

The main differences between this text and “classical approaches” to intro theory courses such as Sipser’s are the following:

Circuits / straightline programs  as the basic model instead of automata

I do not start with finite automata as the basic computational model, but rather with straight-line programs (or, equivalently Boolean circuits) in an extremely simple programming language which I call the “NAND programming language” since its only operation is assigning to one variable the NAND of two others.

Automata are discussed later in the course, after Turing machines and undecidability, as an example for a restricted computational model where problems such as halting are effectively solvable. This actually corresponds to the historical ordering: Boolean algebra goes back to Boole’s work in the 1850’s, Turing machines and undecidability were of course discovered in the 1930’s, while finite automata were introduced in the 1943 work of McCulloch and Pitts but only really understood in the seminal 1959 work of Rabin and Scott. More importantly, the main current practical motivations for restricted models such as regular and context free languages (whether it is for parsing, for analyzing liveness and safety, or even for routing on software defined networks) are precisely because these are tractable models in which semantic questions can be effectively answered. This motivation can be better appreciated after students see the undecidability of semantic properties of general computing models.

Moreover, the Boolean circuit / straightline programs model is extremely simple to both describe and analyze, and some of the main lessons of the theory of computation, including the notions of the duality between code and data, and the idea of universality, can already be seen in this context. Nonuniform computation also gives essential background for some exciting topics I like to cover later in the course. For example, cryptography (making sense of notions such as “256 bits of security” or difficulty of bitcoin mining), pseudorandom generators and the conjecture that BPP=P, and quantum computing. Circuits of course also yield the simplest proof of the Cook Levin theorem.

A programming language instead of Turing machines for modeling uniform computation

Instead of Turing Machines,  I introduce uniform computation using an equivalent model obtained by extending the straightline NAND programming language mentioned above to include loops and arrays (I call the resulting programming language “NAND++”). I do define Turing machines and show the equivalence to NAND++ program, so the students can connect this both to the historical context as well to other literature they may encounter.

I believe that the programming language formalism makes this model more concrete and familiar to the students than Turing machines. For examples, a result such as the equivalence of Turing machines with two dimensional tapes and one dimensional tapes is now described as showing that a language with two dimensional arrays can be “transpiled” into a language with one dimensional arrays only (i.e., two dimensional arrays can be implemented via “syntactic sugar”).

Moreover, because the NAND++ programming language is extremely simple and fully specified,  it is easy to show its formal equivalence with TM’s. In fact,  since NAND++ is essentially obtained by adding a “feedback loop” to Boolean circuits, this makes the Cook Levin theorem easier to prove. I even implemented the Cook Levin reduction in a Python notebook, and so students can see how one can transform a NAND++ program  $P$ into a graph $G$ and a number $k$ such that $G$ has a cut of size $k$ if and only if the program had an input that makes it output $1$. Alas, these graphs tend to be quite large:

I also introduce yet another extension of NAND++ that allows indirect access to arrays, and hence is essentially equivalent to the standard RAM machine model used (implicitly) in algorithms courses. (I call this language NAND<<.) Using a RAM based model makes the distinction between notions such as $O(n)$ and $O(n^2)$ time more meaningful, and makes the time complexity classes correspond to the informal definitions of linear and quadratic time that students encountered in their algorithms lectures (or their whiteboard coding interviews..).

More advanced topics

Reducing the time dedicated to automata (and eliminating context free languages, though I do touch on them in the text in an optional section) allows to spend more time on topics  such randomness and computation, the interactions between proofs and programs (including Gödel’s incompleteness, interactive proof systems, and even a bit on the $\lambda$-calculus and the Curry-Howard correspondence), cryptography, and quantum computing.

The book is still work in progress, but I think is already quite usable as a textbook. Indeed, much of the feedback I got on the course was that people were happy with the text (though perhaps they merely preferred it to the lectures due to my teaching abilities 🙂 ). One student even got me this coffee mug:

In any case, I hope to get more comments as I work on it over the summer. Regardless of whether or not it’s eventually published as a book, I intend to keep a version of it freely available online.

There is an old joke that an economist would not lift a $100 bill lying on the sidewalk, since in equilibrium it should not be there. But economist Bryan Caplan from George Mason University believes the world is leaving trillion dollars or so on the sidewalk. The culprit in his mind is education, which developed countries tend to invest about 5-6% of their GDP on: Correspondingly, the number of years people spend in school has been growing: Most people would think this is a good thing. First of all, an educated population is a good in itself. Second, this seems correlated with growth in general: However, Caplan believes education in high school and above is mostly a waste of time and money. (He is also suspicious about K-8 education, which he calls a “daycare center for kids”.) So in his mind, the GDP growth of advanced countries has been in spite of their investment in education. Who knows how far the U.S. could have come if the average years in school would have stayed at 8 or 10. Caplan’s theory is that people’s productivity is just a function of innate skill, that is not changed at all by education. They only go to high school, college, and beyond due to an “arms race” of credentials aided by ease of access to education. If you own a business and believe Caplan’s theory, you’d probably want to set up shop as far away from universities as possible, where you could get quality workers on the cheap that did not waste time on education. Perhaps Amazon should hire Caplan as a consultant, they seem to be doing the exact opposite. Caplan also wrote a book. Scott Aaronson is not impressed. Quanta magazine has an excellent article by Erica Klarreich on the recent progress on the 2 to 2 conjecture, which I have blogged about before. The article does not go into the technical details, but gives a good perspective on what’s been done and what are the challenges ahead. Eric Posner and Glen Weyl have a new book on “Radical Markets” which I view as to some extent taking a “computer sciency approach” to some of the basic problems of society, economics and democracy. It’s “computer sciency” in the sense that their approach doesn’t place too high of a premium on “backwards compatibility” and also in the sense that, while they talk about transforming basic notions of property rights and even human rights in physical countries, their ideas might end up being more applicable to “digital countries” such as cryptocurrencies and online communities. Indeed, as Eli Ben-Sasson wrote here, many of the unanswered questions in cryptocurrencies are less about crypto and more about governance. (A third more technical sense is that it seems that some of their ideas, such as quadratic voting, correspond to replacing the 1 norm with the 2 norm, which is of course a theme we have seen before. ) Avi Wigderson posted a self contained version of the epilogue of his book on mathematics and computation. This epilogue gives a panoramic view of TCS’s past, present, and future, as well as its connections with its near and not so near neighbors. Regardless of your background, you will learn something from this book. Finally, I’ve recently started to be interested in connections between physics and algorithms, and in particular the relations between statistical physics and algorithmic difficulty. I highly recommend these lecture notes by Afonso Bandiera, Amelia Perry, and Alex Wein. In a tough year for our community, Amelia was another brilliant student that we lost much much too soon, and these notes are dedicated in her memory. (Announcement from Ilias Diakonikolas and David Kempe) We are pleased to announce that we will provide pooled, subsidized child care at STOC 2018. The cost will be$40 per day per child for
regular conference attendees, and \$20 per day per child for students.
For more detailed information, including how to register for STOC 2018
childcare, see http://acm-stoc.org/stoc2018/childcare.html

Ilias Diakonikolas and David Kempe (local arrangements chairs)