I just came back from a wonderful TheoryFest in LA. There was a fantastic program,  including not just the paper presentations, but also tutorials, keynote talks, plenary short papers, and workshops, as well as other events including the junior/senior lunches, STOC 50th birthday, and probably others that I am forgetting right now.

Still, while we are all taking down our TheoryFest trees, we can remind ourselves that there are other holidays on the theory calendar. In particular FOCS 2018 will be in October in Paris, and it also contains a “workshop and tutorial day”.  If you want to organize a workshop or tutorial, see the call for proposals. Key points are:

• Workshop and Tutorial Day: Saturday, October 6, 2018 (Paris)
• Workshop and Tutorial Co-Chairs: Robert Kleinberg and James R. Lee
• Submission deadline: August 1st, 2018
• Send proposals and questions to focs2018workshops@gmail.com

There are worse things in life than organizing a workshop in Paris..

Guest post by Sam Hopkins

I just got back from dinner with some of the great speakers who will be at our TheoryFest workshop tomorrow afternoon on computational thresholds for average-case problems, and I am very excited for what’s coming! Since I didn’t get much chance to introduce the speakers in my last post, and not all of them are the “usual suspects” for a STOC workshop, I’d like to take a few paragraphs to do so here, and discuss a little further what they might speak about.

The talks will run the gamut from statistical physics to machine learning to good old theory of computing, and in particular will aim to address questions at their 3-wise intersection. This intersection is full of open problems which are both interesting and approachable. I hope to see lots of people there!

On to the speakers:

Florent KrzakalaFlorent is a statistical physicist at the Sorbonne in Paris. For more than 10 years he has been one of the leaders in studying high-dimensional statistical inference and average-case computational problems through the lens of statistical physics.

One of my favorite lines Florent’s work was the construction (with several others) of random measurement matrices $A$ and corresponding sparse-signal reconstruction algorithms which can recover a random sparse vector $x \in \mathbb{R}^n$ with $\rho n$ nonzero entries from the measurement $Ax$where the dimension of $A$ is only $\rho n \times \rho$. (That is, algorithms which recover a $\rho n$-sparse vector from only $\rho n$ measurements — not $\rho n \log n$, or even $10 \rho n$!) This involved bringing together insights from compressed sensing and spin-glass theory in a rather remarkable way.

Lately, Florent tells me he has been interested in the physics of matrix factorization and completion problems, and of neural networks. Tomorrow he is going to discuss computational-versus-statistical gaps for a variety of high-dimensional inference problems, addressing the question of why polynomial-time inference algorithms don’t necessarily achieve information-theoretically-optimal results from a statistical physics perspective.

Nike SunNike is a probabilist & computer scientist at UC Berkeley, before which she was the Schramm postdoctoral fellow at MIT and MSR. Many of you may know her from the tour de force work (joint with Jian Ding and Allan Sly) establishing the $k$-SAT threshold for large $k$ — this was the biggest progress in years on the problem in random CSPs.

Her research spans many topics in probability, but tomorrow she is discussing random CSPs, on which she is among the world experts. In this area her work has made great steps in rigorous-izing the predictions of statistical physics regarding the geometry of the space of solutions to a random CSP instance. This geometry is rich: at some clause densities random CSPs seem to have well-connected spaces of solutions, and at others the solution spaces are shattered. In between are a wealth of phase transitions whose algorithmic implications remain poorly understood (read: a great source of open problems!).

Jacob SteinardtJacob is a graduating PhD student at Stanford, and a rising star in provable approaches to machine learning. One of his focuses of late has been design of algorithms for learning problems in the presence of untrustworthy data, model misspecification, and other challenges the real world imposes on the idealized learning settings we often see here at STOC.

His paper with Moses Charikar and Greg Valiant on list-learning has attracted a lot of attention and sparked several follow-up works at this year’s STOC. In that paper, Jacob and his coauthors realized that the problem of learning parameters of a distribution when only a tiny fraction (say 0.001 percent) of your samples come from that distribution provides a generalization of and useful perspective on many classic learning problems, like learning mixture models. (In the list learning setting, one aims to output a list of candidate parameters such that one of the sets of parameters is close to the parameters of the distribution you wanted to learn.)

Jacob’s work has explored the notion that polynomial-time tractability of learning problems is intimately related to robustness. A caricature: if any learning problem solvable in polynomial time can also be solved in polynomial time in the presence of some untrustworthy data, then polynomial-time algorithms cannot solve inference problems which are statistically impossible in the face of untrustworthy data. This offers an intriguing explanation for the existence of average-case/statistical problems which are unsolvable by polynomial-time algorithms, in spite of their information-theoretic (i.e. exponential-time) solvability.

I will also give a talk.

See you there!

Guest post by Sam Hopkins

TheoryFest is in full swing in Los Angeles! There is lots to be written about the wealth of STOC talks, invited papers, keynotes, and (maybe most importantly) the excellent food hall across the street which is swarming with theorists and mathematicians. But for now I want to bring to your attention the workshops planned for Friday.

As always, the schedule is packed with interesting talks! There are workshops on major themes in theory, like

Since I helped organize the last of these, let me say a bit about it.

Recently we have seen a surge of interest in algorithms and lower bounds for average-case problems. One driver of this surge has been theory’s increasing connection with machine learning, which has suggested a number of high-dimensional and noisy statistical inference problems to study. Another driver has been the substantial progress on long-standing problems regarding random instances—for example, a series of recent works proving many old conjectures on random k-SAT instances, and another series of works analyzing algorithms for community detection and recovery in very sparse random graphs.

A common theme in average-case computational problems is that their statistical difficulty (that is, whether they are solvable by any algorithm at all, regardless of running time) is not predictive of their algorithmic difficulty (that is, whether they are solvable by polynomial-time algorithms).

A now-standard example is the $k$-community stochastic blockmodel. In this problem, $n$ nodes are randomly partitioned into $k$ communities, and then a sparse random graph of average degree $d$ is sampled on those vertices is sampled by including an edge $\{i,j\}$ with probability $(d/n)(1 - \epsilon/k)$ if $i,j$ are in differing communities and otherwise with probability $(d/n)(1 + (1-1/k)\epsilon)$ (this produces a graph of average degree $d$).

It turns out that so long as $d > C k \log k / \epsilon^2$ for a big-enough constant $C$, in exponential time one may recover the underlying communities from the graph (approximately, in the sense that it is possible to partition the graph into $k$ classes such that this partitioning has nonvanishing correlation with the ground-truth partition).

However, it is conjectured that this community recovery is possible in polynomial time if and only if (!!) $d > k^2/\epsilon^2$. This threshold is conjectured to be sharp, in the sense that if $d < 0.99 k^2 / \epsilon^2$ no polynomial-time algorithm should exist, while one is known to when $d > 1.01 k^2 / \epsilon^2$.

This kind of threshold phenomenon is ubiquitous for high-dimensional inference problems, and remains largely unexplained. Of course, we are used to algorithmic intractability! But the appearance of intractability for average-case problems seems not to be explained by worst-case theories of hardness (like NP-completeness). And very little is known about what features of an average-case problem are predictive of computational tractability, and which might predict intractability (while in the worst case we know that, say, NP-completeness is an excellent predictor of intractability). The sharpness of a threshold like $d = k^2/\epsilon^2$ is striking and mysterious in a field where ignoring logarithmic factors is practically a (inter)national pastime.

We will have four talks, spanning algorithms and complexity, with (at least) four perspectives on computational thresholds for average-case problems and related matters. Florent Krzakala will lead off with a statistical-physics perspective on algorithms and hardness for matrix and tensor factorization problems. Nike Sun will give us a taste of the wild world of random constraint satisfaction problems. Jacob Steinhardt will talk about the relationship of outlier robustness and inaccurate/partial problem specification to efficient algorithms for statistical inference. Finally, I will survey some recent developments at the intersection of convex programming, the Sum of Squares method, and high-dimensional inference problems.

I hope to see you there!

We had last week the “Women In Theory” workshop at Harvard. Thanks to the efforts of the organizers  Tal Rabin, Shubhangi Saraf, and Lisa Zhang, as well as the hard work of our staff at Harvard, it was (in my opinion) a huge success. But there is still so much more to be done. A recent post by a CS instructor claims that “we have harvested the low-hanging fruit by eliminating overt discrimination and revamping policies and procedures that favored men” and that “having 20 percent women in tech is probably the best we are likely to achieve”. I think this is deeply misguided. Unfortunately, as I’ve heard even during last week’s workshop, there is plenty of both overt and less overt discrimination that is still going around. For some examples see the blogs of Margo Seltzer and  Alice Silverberg (the latter of which I accidentally discovered while looking for some information about trilinear maps).

I don’t think this is only about percentages. Regardless of ratios, I will personally be happy when I hear from my female colleagues and students that they feel fully included and respected. But we still have a long way to go to get there.

It’s that time of the year again, where you can see TheoryFest decorations in every home,  and TheoryFest music is playing in shops and on the radio. For those making the pilgrimage to Los Angeles, I hope to see you there.

There are many great papers, workshops, and other content in the program, but I am particularly looking forward to the Invited Short Plenary talks (and not just because I was involved in their selection..). I think this is a very unique program not just in theory, but among all CS conferences, where the audience gets a chance to see a collection of fantastic works that appeared in a variety of venues. What other event brings together a talk about routing algorithms in data centers and category theory? Or a talk about mechanism design for school choice and the local minima landscape for machine learning?

The talks this year will be the following (in order of presentations – the sessions will be Monday-Thursday at 4 or 4:10pm, the STOC best papers and Lydia Kavraki’s Athena lecture will also be given in these sessions):

• Sanjoy Dasgupta: A Neural Algorithm for Similarity Search (Science)
• Monia Ghobadi: Programming the Topology of Networks: Technology and Algorithms (SIGCOMM 2016)
• Irene Lo: Dynamic Matching in School Choice:Efficient Seat Reassignment after Late Cancellations     (MATCH-UP 2017)
• Vyes Sekar: Rethinking Network Flow Monitoring with UnivMon (SIGCOMM 2016)
• Rong Ge: Optimization Landscape for Matrix Completion  (NIPS 2016 and ICML 2017)
• Sergey Bravyi: Quantum Advantage with Shallow Circuits (QIP 2018)
• Sam Staton: Higher-Order Probability Theory  (LICS 2017)
• Tengyu Ma: Generalization and Equilibrium in Generative Adversarial Nets (GANs) (ICML 2017)
• Dan Suciu: Optimal Query Processing meets Information Theory (PODS 2016)

By Boaz Barak and Jarosław Błasiok

[Jarek gave a great informal exposition of this paper on Friday, and I asked him to make it into a blog post. My only knowledge of the paper is from Jarek’s explanations: my main contribution to this post was to delete some details and insert some inaccuracies –Boaz.]

A recent exciting paper of Ran Raz and Avishay Tal solved a longstanding problem in quantum complexity theory: the existence of an oracle under which the class $\mathbf{BQP}$ is not contained in the polynomial hierarchy. The proof is elegant and actually not extremely complicated, and in this blog post we discuss the main idea behind it. For some of the motivation and background of this problem, see the posts of Scott Aaronson and Lance Fortnow. As is actually often the case in such relativization result, the proof does not have much to do with either $\mathbf{BQP}$ or the polynomial hierarchy. Rather at its heart is a new lower bound against the perennial “victim” of circuit lower bounds: the class $\mathbf{AC}_0$ of constant depth circuits with AND, OR and NOT gates.

Rather than thinking about a single choice of an oracle, it turns out that it’s better to think of a distribution over oracles, and (for notational convenience) as a distribution $D$ over pairs of oracles $X,Y:\{0,1\}^n \rightarrow \{ \pm 1 \}$.
They show that:

1. There is a quantum algorithm that can make one quantum query to each of these oracles, and using this distinguish with advantage at least $\epsilon = 1/poly(n)$ between the case that $X$ and $Y$ are sampled from $D$, and the case that $X$ and $Y$ are sampled from the uniform distribution. (Set $\epsilon = 1/n$ for concreteness.)
2. For every constant depth polynomial size circuit $C$ on $2N$ inputs (where $N = 2^n$), the probability that $C$ outputs $1$ on $(X,Y)$ sampled from $D$ differs by at most $\tilde{O}(\tfrac{1}{\sqrt{N}}))= \tilde{O}(2^{-n/2})$ from the probability it outputs $1$ on $(X,Y)$ sampled from the uniform distribution.

Together 1 and 2 imply the main results by standard arguments.

## The distribution $D$

The distribution $D$ they use is (a variant of) the “Forrelation” distribution proposed for this task by Aaronson. Namely, they choose $Y$ to be $\epsilon$ correlated with the Fourier transform of $X$.

• Sample $(X,Y)$ as correlated Gaussian random variables over $\mathbb{R}^{2N}$ with the covariance matrix $\begin{pmatrix} I & H \\ H^{-1} & I \end{pmatrix}$ where $H$ is the $N\times N$ Hadamard matrix defined as $H_{x,y} = \tfrac{1}{\sqrt{N}}(-1)^{\langle x,y \rangle}$ for every $x,y \in \{0,1\}^n$. Another way to think about this is that $X$ is a standard normal, and $Y=HX$. Note that for any $i, j$, $\mathbb{E} X_i Y_j = \pm \frac{1}{\sqrt{N}}$.
• Then let $D$ be the distribution over $\{ \pm 1 \}^{2N}$ that has the expectation $(\epsilon X,\epsilon Y)$. That is, for $i,j \in \{1,\ldots,N\}$, we choose $D_i$ as a $\pm 1$ random variable with expectation $\epsilon X_i$ and $D_{N+j}$ as a $\pm 1$ random variable with expectation $\epsilon Y_j$. If $|X_i|$ or $|Y_j|$ are larger than $1/\epsilon$ and each coordinate is a standard Normal then we truncate them appropriately, but because we set $\epsilon=1/n$ this will only happen with probability $2^{-\Omega(n^2)}$ which we can treat as negligible and ignore this event from now on.

## Distinguishing $D$ from uniform is easy for quantum algorithms

The heart of the proof is in the classical part, and so we will merely sketch why this problem is easy for quantum algorithms. A Quantum Algorithm that gets query access to $X$ and $Y$, which we think of as functions mapping $\alpha\in \{0,1\}^n$ to $\{ \pm 1\}$ can obtain the state

$\sum_{\alpha} X(\alpha) |0\rangle |\alpha \rangle + \sum_{\alpha} Y(\alpha) |1\rangle |\alpha \rangle$

The algorithm can then perform the Quantum Fourier Transform (in this case applying Hadamard to its qubits $2..n+1$) to change this to the state

$\sum_{\alpha} \hat{X}(\alpha) |0\rangle |\alpha \rangle + \sum_{\alpha} Y(\alpha) |1\rangle |\alpha \rangle$

But in our case, $Y$ will be $\epsilon$ correlated with $\hat{X}$ and so this can be shown to have the form

$\epsilon (|0\rangle + |1\rangle)\rho + \rho'$

where $\rho$ is some state on the last $n$ qubits.

If we now perform the Hadamard operation on the first qubit and measure it, we will be about $\epsilon$ more likely to see a zero than to see a 1. On the other hand it is not hard to show that if the input was uniform then we will see zero and one with equal probability. The advantage of $\epsilon$ can be amplified by making more queries or considering “tensored” forms of the same oracles.

## $D$ is pseudorandom for AC0 circuits

We now come to the heart of the proof: showing that it is hard to distinguish $D$ from the uniform distribution for $AC_0$ circuits. One of the challenges of obtaining such an oracle separation is that the standard approach of showing such a result, which is to show that $D$ fools low degree polynomials, is not available to us. Indeed, our quantum algorithm itself is a degree 2 polyonomial in $X$ and $Y$!

At a high level, the proof shows that AC0 circuits can be approximated (to a certain extent) by very special kind of low degree polynomials – ones that are sparse in the sense that the L1 norm of their coefficients is not much larger than the L2 norm.

Anyway, we are getting ahead of ourselves. Let $f:\{ \pm 1 \}^{2N} \rightarrow \{ \pm 1 \}$ be computable by a polynomial size constant depth $AC_0$ circuit. Our goal is to show the following result:

Main Theorem: $|\mathbb{E} f(D) - \mathbb{E} f(U) | \leq \tilde{O}(\frac{1}{\sqrt{N}})$

We will identify such a function $f$ with its multilinear extention, and hence think of $f$ as the polynomial $f(Z) = \sum_{S \subseteq [2N]} \hat{f}(S)\mathbb{E} \prod_{i\in S}Z_i$. Note that in this notation, $\mathbb{E} f(U) = \hat{f}(\emptyset) = f(0)$.

We start with the following simple observation: for every multilinear map $f:\mathbb{R}^{2N} \rightarrow \mathbb{R}$, and $(X,Y) \in \mathbb{R}^{2N}$, if we sample from a product distribution $D$ where the expectation of each coordinate matches the coordinate in $(X,Y)$, then $\mathbb{E} f(D) = f(X,Y)$. This means that for the purposes of analyzing the main theorem, we could just as well imagine that $D$ is sampled from the normal multivariate distribution $Z$ over $\mathbb{R}^{2N}$ with correlation matrix $\epsilon^2 \begin{pmatrix} I & H \\ H^{-1} & I \end{pmatrix}$.

Hence the result will follow from the following result:

Main Lemma Let $Z$ be any multivariate Gaussian distribution over $\mathbb{R}^M$ such that $\mathbb{E} Z_i^2 = o( \log^{-1/2} (M/\delta))$ and $|\mathbb{E} Z_i Z_j | \leq \delta$ for $i\neq j$. Then $Z$ is $\tilde{O}(\delta)$ pseudorandom w.r.t. multilinear polynomials computable by polynomial sized circuits of constant depth.

The $\tilde{O}$ notation hides factors poly-logarithmic in $M$. In our case, the distribution $Z$ satisfies $\mathbb{E} Z_i^2 = \epsilon^2 = 1/n$, and $|\mathbb{E}Z_iZ_j| \leq \tfrac{1}{\sqrt{N}}$ and so we get that $Z$ (and hence our original $D$) is $\tilde{O}(\tfrac{1}{\sqrt{N}})$ pseudorandom, completing the proof of the main theorem.

Note that the Main Lemma is of interest in its own right – it says that for a multivariate Gaussian to fool AC0, it is enough that the second moments sufficiently decay as long as the diagonal entries are moderately bounded away from 1. (However, for multivariate Gaussians, a bound on the second moments also implies decay of higher moments by Isserlis’ theorem.)

### Proof of the Main Lemma

The Main Lemma is proved by a “Hybrid Argument”. (Related to arguments used by Chattopadhyay, Hatami, Hosseini and Lovett for constructing pseudorandom generators.) We will set $T = (M/\delta)^{10}$. Since a sum of Gaussians is a Gaussian, for every multivariate Gaussian $Z$, we can write

$Z = \tfrac{1}{\sqrt{T}}\sum_{i=1}^T Z^i$

where $Z^i$ is identically distributed to $Z$.

We write $Z^{\leq i} = \tfrac{1}{\sqrt{T}}\sum_{j=1}^i Z^j$. To prove the Main Lemma we need to show that $|\mathbb{E}f(Z^{\leq T}) - f(0)| \leq \tilde{O}(\delta)$ for every constant-depth poly-size computable multilinear $f$. To do so we will establish the following inequality:

$| \mathbb{E} f(Z^{\leq i}+\tfrac{1}{\sqrt{T}}Z^{i+1}) - \mathbb{E} f(Z^{\leq i}) | \leq \tilde{O}(\tfrac{\delta}{T}) \;\;\;\;(*)$
for every $i=0,1,\ldots,T-1$. (Writing $Z^{\leq 0}$ for the $0$ vector.)
We can obtain the main lemma by summing up $(*)$ over all $i=0,\ldots, T-1$.

The main tool we will use to establish $(*)$ is the following structural result on AC0, which is again of independent interest:

L1 bound theorem: Let $f:\mathbb{R}^M \rightarrow \mathbb{R}$ be a function computable by an $\mathbf{AC}_0$ circuit of quasipolynomial size and constant depth, and let $z_0 \in [-1/2,+1/2]^M$ and $\delta<1/100$. Then the function $g:\mathbb{R}^M \rightarrow \mathbb{R}$ defined as $g(z) = f(\delta z+z_0)-f(z_0)$ satisfies that
$\sum_{U \subseteq [M], |U|=k} |\hat{g}(U)| \leq (2 \delta)^k (\log N)^{O(k)}$ for every $k \geq 1$.

For $z_0 = 0$ those bounds on Fourier coefficients were proven by Avishay Tal. It turns out that general case can be essentially reduced to this case by a clever random restriction.

The L1 bound theorem implies $(*)$, by writing $g(z)=f(\tfrac{1}{\sqrt{T}}z+z_0)-f(z_0)$ for $z_0$ sampled from $Z^{\leq i}$. (The event $z_0 \in [-1/2,+1/2]^M$ holds with high probability. It can be shown that conditioning on this event changes the relevant expectations in $(*)$ by at most an additive term that is proportional to the failure probability, which is smaller than $\frac{\delta}{T}$.)

We can then bound the lefthand side of $(*)$, by

$\sum_{k \geq 1}\sum_{U \subseteq [M], |U|=k} |\hat{g}(U)| \mathbb{E} \prod_{i\in U}Z_i \leq \sum_{k \geq 1}(\tfrac{2}{\sqrt{T}})^k (\log N)^{O(k)} \mathbb{E} \prod_{i\in U}Z_i$
the moment for $k=1$ vanishes. In the moments arising from $k\geq 3$, the term $(\tfrac{1}{\sqrt{T}})^k$ makes the contribution negligible, even when we sum it over $T$ terms — by Isserlis’ theorem  we mentioned above we can bound $|\mathbb{E} \prod_{i \in U} Z_i| \leq (\delta k)^k \leq T^{0.1k}$. The only interesting term is when $k=2$ where we get exactly $\tilde{O}(\delta/T)$ as desired. (As an aside, Isserlis’ Theorem gives a way to compute the moments of any multivariate Gaussian, and has a beautiful two line proof using Taylor expansion of $\mathbb{E} \exp(\sum \lambda_i Z_i) = \exp(0.5 \mathbb{E} (\sum \lambda_i Z_i)^2)$ in the formal variables $\lambda_1,\ldots,\lambda_M$.)

### Proof of the L1 bound theorem

We now show the proof of the L1 bound theorem. We say that function $h$ is a restriction of $f$, if $h(z) = f(r(z))$ where $r(z)_i \in \{1, -1, z_i\}$.

Claim: For every fixed $z_0 \in [-1/2, 1/2]^M$ and multilinear $f$ there exist a random $h$ which is restriction of $f$, such that
$g(z) = f(\delta z + z_0) = \mathbb{E}_{h} h(2 \delta z).$

Indeed, for given $z_0$, take a variable $\tilde{z} \in \mathbb{R}^M$, such that on each coordinate independently we have $\tilde{z}_i = 2\delta z_i$ with probability $\frac{1}{2}$, $\tilde{z_i} = 1$ with probability $(z_0)_i/2 + \frac{1}{4}$, and $\tilde{z}_i = -1$ otherwise. Note that $\mathbb{E} \tilde{z} = \delta z + z_0$. Since $f$ is multilinear, and $\tilde{z}$ is independent across coordinates, we have
$g(z) = f(\delta z + z_0) = f(\mathbb{E} \tilde{z}) = \mathbb{E} f(\tilde{z}) = \mathbb{E} h(2 \delta z),$
where $h$ depending on realization of $\tilde{z}$ is a corresponding restriction of $f$ — it fixes coordinates where $\tilde{z}_i \in \{\pm 1\}$ accordingly.

Now the L1 bound theorem follows by linearity of Fourier transform: for specific $U$ we have $\hat{g}(U) = \mathbb{E}_h (2\delta)^{|U|} \hat{h}(U)$. Every realization of $h$ as a restriction of an $\mathbf{AC}_0$ circuit is again an $\mathbf{AC}_0$ circuit (both size and depth can only decrease), hence we can apply the bounds on Fourier coeffiecients of $\mathbf{AC}_0$ circuits to $h$, namely $\sum_{|U| = k} |\hat{h}(U)| \leq (\log N)^{O(k)}$. Plugging those together we get
$\sum_{U \subset [M], |U| = k} |\hat{g}(U)| \leq \mathbb{E}_h \sum_{|U| = k} |\hat{h}(U)| (2\delta)^k \leq (2\delta)^k (\log N)^{O(k)}.$

which is what we needed to prove.

A couple of years ago I posted about Theory Life Hacks. “Life” might be a bit too grand, but what I really mean are technological tools that can help in the day to day work. In the last two years, perhaps because of taking more responsibilities, I found myself using more technology and so I thought  thought I’d post what I’m using these days, and most importantly hope to hear from readers.

## Running courses

I am now teaching a large course at Harvard with a staff of about 18 people. I use slack for communication with the staff, Piazza for communication with the students, Gradescope for submitting student work and grading exams, while the course data is kept in a private GitHub repository (see below).

## Markdown

These days I use markdown for almost everything. For me it is basically plaintext where I can also write  math between dollar signs. I use markdown to write my all my lecture notes including my 500+ pages intro to TCS book. I use it to write short technical notes and emails: in particular the Markdown Here extension  lets me write math in gmail, and Hackmd is a good way to share short notes. Finally, I am also using it to produce presentations for my courses. (For one-off presentations such as  talks, I still prefer PowerPoint.)

For much of these I use pandoc to transform markdown into various formats, including latex/pdf, html, and html slides. I still use Atom as my main editor, and it’s only become better over time. I use the sync-setting package to synchronize settings between machines and autosave to avoid losing my work. It took me forever to figure out how to enable spell checking using the spell-check package on Windows. The trick  is to enable the SPELLCHECKER_PREFER_HUNSPELL environment variable.

## Text extenders and clipboard managers

Two apps that save me a ton of time are Breevy (a text expander) and Ditto (a clipboard manager). A clipboard manager simply remembers all the things you copied. For example, if you are filling out a recommendation or review form and need to copy stuff from your text file, you can copy all the fields first and then paste them one by one. A text expander allows  you to expand short pieces of text into longer ones. For example, these days I am co director of undergraduate studies at Harvard and find often the need to write the same email again and again. For example, if I type :dusadmit in any window, it automatically expands to

Dear X,

Information for prospective undergrads interested in the School Engineering and Applied Sciences (SEAS) is at  https://www.seas.harvard.edu/audiences/prospective-undergraduates, and some information about Computer Science is at  https://www.seas.harvard.edu/programs/computer-science and  https://cs50.harvard.edu/guide.

Best and good luck,

Boaz Barak

Gordon McKay Professor of Computer Science
John A. Paulson School of Engineering and Applied Sciences
Harvard University

The text expander even stops to ask me for the name so X above is replaced with it. This way I answer with pertinent and helpful information without spending all my time typing. For mass personalized emails, I use the Mail Merge for Gmail add on.

I also use the text expander for various other goals. For example, no matter in which program I am, if I type \bits it always expands to \{0,1\}.  (Indeed I find myself using less and less custom latex commands these days.) I also use it to automate my workflow with git, which is still my favorite version control system. If I type :gitsub it gets expanded to git submodule foreach 'git add -A ; git commit -m wip ; git pull --rebase ; git push'; git add -A ; git commit -m wip ; git pull --rebase ; git push which basically updates my repository in full. I use it also to remember folder names: if I type :cs121 then it expands to C:\DBOX\WORK\Courses\CS121\

## Version control, websites

As mentioned above, I use git for keeping track of papers, courses, and many other projects. You can host git repositories on GitHub, Bit Bucket, or Gitlab, and there are other services as well. I like to put my git repositories inside my dropbox folder even though you are not supposed to do that. The way I handle it is that I have for each project two subfolders Laptop and Desktop that contain two copies of the same repository. I only use Laptop on my laptop  and Desktop on my desktop and so I never touch the same folder from two computers. So far it seems to work.

Netlify is a great service that allows you to keep a website updated from a repository. It works well together with Hugo, which  is a website generator that allows you to write your website using.. you guessed it .. markdown.

## Windows specific tools

These days I find myself in the command line a lot for compiling markdown, pushing things to git etc.., cmder is a nice console app for Windows. Sumatra is still my favorite pdf viewer. For drawing quick doodles using the surface pen, the sketchpad app that comes with Windows (“Windows Ink Workspace”) is actually quite nice.  I sometimes need to write some code snippets – anaconda and jupyter lab are really nice (and not windows specific). And of course I use MikTex.

Anyway, that’s it about my technology use..  would be curious to learn what other people are using to save time and effort.