## Theoryfest recap and FOCS call for workshops

I just came back from a wonderful TheoryFest in LA. There was a fantastic program, including not just the paper presentations, but also tutorials, keynote talks, plenary short papers, and workshops, as well as other events including the junior/senior lunches, STOC 50th birthday, and probably others that I am forgetting right now.

Still, while we are all taking down our TheoryFest trees, we can remind ourselves that there are other holidays on the theory calendar. In particular FOCS 2018 will be in October in Paris, and it also contains a “workshop and tutorial day”. If you want to organize a workshop or tutorial, see the call for proposals. Key points are:

- Workshop and Tutorial Day:
**Saturday, October 6, 2018**(Paris) - Workshop and Tutorial Co-Chairs:
**Robert Kleinberg**and**James R. Lee** - Submission deadline:
**August 1st, 2018** - Notification:
**August 6th, 2018** - Send proposals and questions to
**focs2018workshops@gmail.com**

There are worse things in life than organizing a workshop in Paris..

*Guest post by Sam Hopkins*

I just got back from dinner with some of the great speakers who will be at our TheoryFest workshop tomorrow afternoon on computational thresholds for average-case problems, and I am very excited for what’s coming! Since I didn’t get much chance to introduce the speakers in my last post, and not all of them are the “usual suspects” for a STOC workshop, I’d like to take a few paragraphs to do so here, and discuss a little further what they might speak about.

The talks will run the gamut from statistical physics to machine learning to good old theory of computing, and in particular will aim to address questions at their 3-wise intersection. This intersection is full of open problems which are both interesting and approachable. I hope to see lots of people there!

On to the speakers:

**Florent Krzakala: **Florent is a statistical physicist at the Sorbonne in Paris. For more than 10 years he has been one of the leaders in studying high-dimensional statistical inference and average-case computational problems through the lens of statistical physics.

One of my favorite lines Florent’s work was the construction (with several others) of random measurement matrices and corresponding sparse-signal reconstruction algorithms which can recover a random sparse vector with nonzero entries from the measurement , *where the dimension of is only *. (That is, algorithms which recover a -sparse vector from only measurements — not , or even !) This involved bringing together insights from compressed sensing and spin-glass theory in a rather remarkable way.

Lately, Florent tells me he has been interested in the physics of matrix factorization and completion problems, and of neural networks. Tomorrow he is going to discuss computational-versus-statistical gaps for a variety of high-dimensional inference problems, addressing the question of why polynomial-time inference algorithms don’t necessarily achieve information-theoretically-optimal results from a statistical physics perspective.

**Nike Sun: **Nike is a probabilist & computer scientist at UC Berkeley, before which she was the Schramm postdoctoral fellow at MIT and MSR. Many of you may know her from the tour de force work (joint with Jian Ding and Allan Sly) establishing the -SAT threshold for large — this was the biggest progress in years on **the **problem in random CSPs.

Her research spans many topics in probability, but tomorrow she is discussing random CSPs, on which she is among the world experts. In this area her work has made great steps in rigorous-izing the predictions of statistical physics regarding the *geometry of the space of solutions to a random CSP instance. *This geometry is rich: at some clause densities random CSPs seem to have *well-connected *spaces of solutions, and at others the solution spaces are *shattered. *In between are a wealth of phase transitions whose algorithmic implications remain poorly understood (read: a great source of open problems!).

**Jacob Steinardt: **Jacob is a graduating PhD student at Stanford, and a rising star in provable approaches to machine learning. One of his focuses of late has been design of algorithms for* *learning problems in the presence of untrustworthy data, model misspecification, and other challenges the real world imposes on the idealized learning settings we often see here at STOC.

His paper with Moses Charikar and Greg Valiant on *list-learning *has attracted a lot of attention and sparked several follow-up works at this year’s STOC. In that paper, Jacob and his coauthors realized that the problem of learning parameters of a distribution when only a tiny fraction (say 0.001 percent) of your samples come from that distribution provides a generalization of and useful perspective on many classic learning problems, like learning mixture models. (In the list learning setting, one aims to output a *list *of candidate parameters such that one of the sets of parameters is close to the parameters of the distribution you wanted to learn.)

Jacob’s work has explored the notion that polynomial-time tractability of learning problems is intimately related to robustness. A caricature: if any learning problem solvable in polynomial time can also be solved in polynomial time in the presence of some untrustworthy data, then polynomial-time algorithms cannot solve inference problems which are *statistically impossible in the face of untrustworthy data. *This offers an intriguing explanation for the existence of average-case/statistical problems which are unsolvable by polynomial-time algorithms, in spite of their information-theoretic (i.e. exponential-time) solvability.

**I **will also give a talk.

See you there!

**Edit: **correct some attributions.

*Guest post by Sam Hopkins*

TheoryFest is in full swing in Los Angeles! There is lots to be written about the wealth of STOC talks, invited papers, keynotes, and (maybe most importantly) the excellent food hall across the street which is swarming with theorists and mathematicians. But for now I want to bring to your attention the workshops planned for Friday.

As always, the schedule is packed with interesting talks! There are workshops on major themes in theory, like

- deep learning
- fairness in algorithms
- recent constructions of extractors
- Vijay Vazirani’s birthday, and
- average-case problems and computational thresholds

Since I helped organize the last of these, let me say a bit about it.

Recently we have seen a surge of interest in algorithms and lower bounds for average-case problems. One driver of this surge has been theory’s increasing connection with machine learning, which has suggested a number of high-dimensional and noisy statistical inference problems to study. Another driver has been the substantial progress on long-standing problems regarding random instances—for example, a series of recent works proving many old conjectures on random *k*-SAT instances, and another series of works analyzing algorithms for community detection and recovery in very sparse random graphs.

A common theme in average-case computational problems is that their *statistical *difficulty (that is, whether they are solvable by any algorithm at all, regardless of running time) is not predictive of their *algorithmic *difficulty (that is, whether they are solvable by polynomial-time algorithms).

A now-standard example is the -community stochastic blockmodel. In this problem, nodes are randomly partitioned into communities, and then a sparse random graph of average degree is sampled on those vertices is sampled by including an edge with probability if are in differing communities and otherwise with probability (this produces a graph of average degree ).

It turns out that so long as for a big-enough constant , in exponential time one may recover the underlying communities from the graph (*approximately, *in the sense that it is possible to partition the graph into classes such that this partitioning has nonvanishing correlation with the ground-truth partition).

However, it is conjectured that this community recovery is possible in polynomial time *if and only if (!!) *. This threshold is conjectured to be *sharp*, in the sense that if no polynomial-time algorithm should exist, while one is known to when .

This kind of threshold phenomenon is ubiquitous for high-dimensional inference problems, and remains largely unexplained. Of course, we are used to algorithmic intractability! But the appearance of intractability for average-case problems seems not to be explained by worst-case theories of hardness (like NP-completeness). And very little is known about what features of an average-case problem are predictive of computational tractability, and which might predict intractability (while in the worst case we know that, say, NP-completeness is an excellent predictor of intractability). The *sharpness *of a threshold like is striking and mysterious in a field where ignoring logarithmic factors is practically a (inter)national pastime.

We will have four talks, spanning algorithms and complexity, with (at least) four perspectives on computational thresholds for average-case problems and related matters. **Florent Krzakala** will lead off with a statistical-physics perspective on algorithms and hardness for matrix and tensor factorization problems. **Nike Sun** will give us a taste of the wild world of random constraint satisfaction problems.** Jacob Steinhardt** will talk about the relationship of outlier robustness and inaccurate/partial problem specification to efficient algorithms for statistical inference. Finally, I will survey some recent developments at the intersection of convex programming, the Sum of Squares method, and high-dimensional inference problems.

I hope to see you there!

## Women in theory (and CS in general)

We had last week the “Women In Theory” workshop at Harvard. Thanks to the efforts of the organizers Tal Rabin, Shubhangi Saraf, and Lisa Zhang, as well as the hard work of our staff at Harvard, it was (in my opinion) a huge success. But there is still so much more to be done. A recent post by a CS instructor claims that “we have harvested the low-hanging fruit by eliminating overt discrimination and revamping policies and procedures that favored men” and that “having 20 percent women in tech is probably the best we are likely to achieve”. I think this is deeply misguided. Unfortunately, as I’ve heard even during last week’s workshop, there is plenty of both overt and less overt discrimination that is still going around. For some examples see the blogs of Margo Seltzer and Alice Silverberg (the latter of which I accidentally discovered while looking for some information about trilinear maps).

I don’t think this is only about percentages. Regardless of ratios, I will personally be happy when I hear from my female colleagues and students that they feel fully included and respected. But we still have a long way to go to get there.

## Celebrating TheoryFest

It’s that time of the year again, where you can see TheoryFest decorations in every home, and TheoryFest music is playing in shops and on the radio. For those making the pilgrimage to Los Angeles, I hope to see you there.

There are many great papers, workshops, and other content in the program, but I am particularly looking forward to the Invited Short Plenary talks (and not just because I was involved in their selection..). I think this is a very unique program not just in theory, but among all CS conferences, where the audience gets a chance to see a collection of fantastic works that appeared in a variety of venues. What other event brings together a talk about routing algorithms in data centers and category theory? Or a talk about mechanism design for school choice and the local minima landscape for machine learning?

The talks this year will be the following (in order of presentations – the sessions will be Monday-Thursday at 4 or 4:10pm, the STOC best papers and Lydia Kavraki’s Athena lecture will also be given in these sessions):

*Sanjoy Dasgupta***:**(Science)**A Neural Algorithm for Similarity Search***Monia Ghobadi:*(SIGCOMM 2016)**Programming the Topology of Networks: Technology and Algorithms***Irene Lo:*(MATCH-UP 2017)**Dynamic Matching in School Choice:Efficient Seat Reassignment after Late Cancellations***Vyes Sekar:*(SIGCOMM 2016)**Rethinking Network Flow Monitoring with UnivMon***Rong Ge:*(NIPS 2016 and ICML 2017)**Optimization Landscape for Matrix Completion***Sergey Bravyi:***Quantum Advantage with Shallow Circuits**(QIP 2018)*Sam Staton:*(LICS 2017)**Higher-Order Probability Theory***Tengyu Ma:*(ICML 2017)**Generalization and Equilibrium in Generative Adversarial Nets (GANs)***Dan Suciu:***Optimal Query Processing meets Information Theory**(PODS 2016)

## On the Raz-Tal oracle separation of BQP and PH

By Boaz Barak and Jarosław Błasiok

*[Jarek gave a great informal exposition of this paper on Friday, and I asked him to make it into a blog post. My only knowledge of the paper is from Jarek’s explanations: my main contribution to this post was to delete some details and insert some inaccuracies –Boaz.]*

A recent exciting paper of Ran Raz and Avishay Tal solved a longstanding problem in quantum complexity theory: the existence of an oracle under which the class is not contained in the polynomial hierarchy. The proof is elegant and actually not extremely complicated, and in this blog post we discuss the main idea behind it. For some of the motivation and background of this problem, see the posts of Scott Aaronson and Lance Fortnow. As is actually often the case in such relativization result, the proof does not have much to do with either or the polynomial hierarchy. Rather at its heart is a new lower bound against the perennial “victim” of circuit lower bounds: the class of constant depth circuits with AND, OR and NOT gates.

Rather than thinking about a single choice of an oracle, it turns out that it’s better to think of a *distribution* over oracles, and (for notational convenience) as a distribution over *pairs* of oracles .

They show that:

- There is a quantum algorithm that can make one quantum query to each of these oracles, and using this distinguish with advantage at least between the case that and are sampled from , and the case that and are sampled from the uniform distribution. (Set for concreteness.)
- For every constant depth polynomial size circuit on inputs (where ), the probability that outputs on sampled from differs by at most from the probability it outputs on sampled from the uniform distribution.

Together 1 and 2 imply the main results by standard arguments.

## The distribution

The distribution they use is (a variant of) the “Forrelation” distribution proposed for this task by Aaronson. Namely, they choose to be correlated with the Fourier transform of .

Specifically however, it is better to think about this distribution as follows:

- Sample as correlated Gaussian random variables over with the covariance matrix where is the Hadamard matrix defined as for every . Another way to think about this is that is a standard normal, and . Note that for any , .
- Then let be the distribution over that has the expectation . That is, for , we choose as a random variable with expectation and as a random variable with expectation . If or are larger than and each coordinate is a standard Normal then we truncate them appropriately, but because we set this will only happen with probability which we can treat as negligible and ignore this event from now on.

## Distinguishing from uniform is easy for quantum algorithms

The heart of the proof is in the classical part, and so we will merely sketch why this problem is easy for quantum algorithms. A Quantum Algorithm that gets query access to and , which we think of as functions mapping to can obtain the state

The algorithm can then perform the Quantum Fourier Transform (in this case applying Hadamard to its qubits ) to change this to the state

But in our case, will be correlated with and so this can be shown to have the form

where is some state on the last qubits.

If we now perform the Hadamard operation on the first qubit and measure it, we will be about more likely to see a zero than to see a 1. On the other hand it is not hard to show that if the input was uniform then we will see zero and one with equal probability. The advantage of can be amplified by making more queries or considering “tensored” forms of the same oracles.

## is pseudorandom for AC0 circuits

We now come to the heart of the proof: showing that it is *hard* to distinguish from the uniform distribution for circuits. One of the challenges of obtaining such an oracle separation is that the standard approach of showing such a result, which is to show that fools low degree polynomials, is not available to us. Indeed, our quantum algorithm itself is a degree 2 polyonomial in and !

At a high level, the proof shows that AC0 circuits can be approximated (to a certain extent) by very special kind of low degree polynomials – ones that are *sparse* in the sense that the L1 norm of their coefficients is not much larger than the L2 norm.

Anyway, we are getting ahead of ourselves. Let be computable by a polynomial size constant depth circuit. Our goal is to show the following result:

**Main Theorem:**

We will identify such a function with its *multilinear* extention, and hence think of as the polynomial . Note that in this notation, .

We start with the following simple observation: for every multilinear map , and , if we sample from a product distribution where the expectation of each coordinate matches the coordinate in , then . This means that for the purposes of analyzing the main theorem, we could just as well imagine that is sampled from the normal multivariate distribution over with correlation matrix .

Hence the result will follow from the following result:

**Main Lemma** Let be any multivariate Gaussian distribution over such that and for . Then is pseudorandom w.r.t. multilinear polynomials computable by polynomial sized circuits of constant depth.

The notation hides factors poly-logarithmic in . In our case, the distribution satisfies , and and so we get that (and hence our original ) is pseudorandom, completing the proof of the main theorem.

Note that the Main Lemma is of interest in its own right – it says that for a multivariate Gaussian to fool AC0, it is enough that the second moments sufficiently decay as long as the diagonal entries are moderately bounded away from 1. (However, for multivariate Gaussians, a bound on the second moments also implies decay of higher moments by Isserlis’ theorem.)

### Proof of the Main Lemma

The Main Lemma is proved by a “Hybrid Argument”. (Related to arguments used by Chattopadhyay, Hatami, Hosseini and Lovett for constructing pseudorandom generators.) We will set . Since a sum of Gaussians is a Gaussian, for every multivariate Gaussian , we can write

where is identically distributed to .

We write . To prove the Main Lemma we need to show that for every constant-depth poly-size computable multilinear . To do so we will establish the following inequality:

for every . (Writing for the vector.)

We can obtain the main lemma by summing up over all .

The main tool we will use to establish is the following structural result on AC0, which is again of independent interest:

**L1 bound theorem:** Let be a function computable by an circuit of quasipolynomial size and constant depth, and let and . Then the function defined as satisfies that

for every .

For those bounds on Fourier coefficients were proven by Avishay Tal. It turns out that general case can be essentially reduced to this case by a clever random restriction.

The L1 bound theorem implies , by writing for sampled from . (The event holds with high probability. It can be shown that conditioning on this event changes the relevant expectations in by at most an additive term that is proportional to the failure probability, which is smaller than .)

We can then bound the lefthand side of , by

the moment for vanishes. In the moments arising from , the term makes the contribution negligible, even when we sum it over terms — by Isserlis’ theorem we mentioned above we can bound . The only interesting term is when where we get exactly as desired. (As an aside, Isserlis’ Theorem gives a way to compute the moments of any multivariate Gaussian, and has a beautiful two line proof using Taylor expansion of in the formal variables .)

### Proof of the L1 bound theorem

We now show the proof of the L1 bound theorem. We say that function is a restriction of , if where .

**Claim:** For every fixed and multilinear there exist a random which is restriction of , such that

Indeed, for given , take a variable , such that on each coordinate independently we have with probability , with probability , and otherwise. Note that . Since is multilinear, and is independent across coordinates, we have

where depending on realization of is a corresponding restriction of — it fixes coordinates where accordingly.

Now the L1 bound theorem follows by linearity of Fourier transform: for specific we have . Every realization of as a restriction of an circuit is again an circuit (both size and depth can only decrease), hence we can apply the bounds on Fourier coeffiecients of circuits to , namely . Plugging those together we get

which is what we needed to prove.

## Theory Life-Hacks II

A couple of years ago I posted about Theory Life Hacks. “Life” might be a bit too grand, but what I really mean are technological tools that can help in the day to day work. In the last two years, perhaps because of taking more responsibilities, I found myself using more technology and so I thought thought I’d post what I’m using these days, and most importantly hope to hear from readers.

## Running courses

I am now teaching a large course at Harvard with a staff of about 18 people. I use slack for communication with the staff, Piazza for communication with the students, Gradescope for submitting student work and grading exams, while the course data is kept in a private GitHub repository (see below).

## Markdown

These days I use markdown for almost everything. For me it is basically plaintext where I can also write math between dollar signs. I use markdown to write my all my lecture notes including my 500+ pages intro to TCS book. I use it to write short technical notes and emails: in particular the Markdown Here extension lets me write math in gmail, and Hackmd is a good way to share short notes. Finally, I am also using it to produce presentations for my courses. (For one-off presentations such as talks, I still prefer PowerPoint.)

For much of these I use pandoc to transform markdown into various formats, including latex/pdf, html, and html slides. I still use Atom as my main editor, and it’s only become better over time. I use the sync-setting package to synchronize settings between machines and autosave to avoid losing my work. It took me forever to figure out how to enable spell checking using the spell-check package on Windows. The trick is to enable the `SPELLCHECKER_PREFER_HUNSPELL`

environment variable.

## Text extenders and clipboard managers

Two apps that save me a ton of time are Breevy (a text expander) and Ditto (a clipboard manager). A *clipboard manager* simply remembers all the things you copied. For example, if you are filling out a recommendation or review form and need to copy stuff from your text file, you can copy all the fields first and then paste them one by one. A *text expander* allows you to expand short pieces of text into longer ones. For example, these days I am co director of undergraduate studies at Harvard and find often the need to write the same email again and again. For example, if I type `:dusadmit`

in any window, it automatically expands to

Dear X,

Thanks for your email and your interest in Harvard. For information about admission to the Harvard undergraduate program, please see the admissions website, https://college.harvard.edu/admissions, which has information about the application process and requirements. Some additional information for international students is at https://college.harvard.edu/admissions/application-process/international-applicants.

Information for prospective undergrads interested in the School Engineering and Applied Sciences (SEAS) is at https://www.seas.harvard.edu/audiences/prospective-undergraduates, and some information about Computer Science is at https://www.seas.harvard.edu/programs/computer-science and https://cs50.harvard.edu/guide.

Please note that the Computer Science Directors of Undergraduate Studies are not directly involved in the Harvard admissions process, and so questions about admissions should be directed to the admissions office.

Best and good luck,

Boaz Barak

Gordon McKay Professor of Computer Science

John A. Paulson School of Engineering and Applied Sciences

Harvard University

The text expander even stops to ask me for the name so `X`

above is replaced with it. This way I answer with pertinent and helpful information without spending all my time typing. For mass personalized emails, I use the Mail Merge for Gmail add on.

I also use the text expander for various other goals. For example, no matter in which program I am, if I type `\bits`

it always expands to `\{0,1\}`

. (Indeed I find myself using less and less custom latex commands these days.) I also use it to automate my workflow with git, which is still my favorite version control system. If I type `:gitsub`

it gets expanded to `git submodule foreach 'git add -A ; git commit -m wip ; git pull --rebase ; git push'; git add -A ; git commit -m wip ; git pull --rebase ; git push`

which basically updates my repository in full. I use it also to remember folder names: if I type `:cs121`

then it expands to `C:\DBOX\WORK\Courses\CS121\`

## Version control, websites

As mentioned above, I use git for keeping track of papers, courses, and many other projects. You can host git repositories on GitHub, Bit Bucket, or Gitlab, and there are other services as well. I like to put my git repositories inside my dropbox folder even though you are not supposed to do that. The way I handle it is that I have for each project two subfolders `Laptop`

and `Desktop`

that contain two copies of the same repository. I only use `Laptop`

on my laptop and `Desktop`

on my desktop and so I never touch the same folder from two computers. So far it seems to work.

Netlify is a great service that allows you to keep a website updated from a repository. It works well together with Hugo, which is a website generator that allows you to write your website using.. you guessed it .. markdown.

## Windows specific tools

These days I find myself in the command line a lot for compiling markdown, pushing things to git etc.., cmder is a nice console app for Windows. Sumatra is still my favorite pdf viewer. For drawing quick doodles using the surface pen, the sketchpad app that comes with Windows (“Windows Ink Workspace”) is actually quite nice. I sometimes need to write some code snippets – anaconda and jupyter lab are really nice (and not windows specific). And of course I use MikTex.

Anyway, that’s it about my technology use.. would be curious to learn what other people are using to save time and effort.