In May 2016, after Donald Trump was elected as the republican nominee for president, I wrote the following blog post. I ended up not publishing it – this has always been a technical blog (and also more of a group blog, at the time). While the damage of a Donald Trump presidency was hypothetical at the time, it is now very real and in a second term the stakes are only higher. I once again hope American readers of this blog would do what they can to support Joe Biden.
Note: this is not an invitation for a debate on who to vote for in the comments. At this point, if you are educated and following the news (as I imagine all readers of this blog are) then if you are not already convinced that Donald Trump is a danger to this country and the world, nothing I will say will change your mind. Similarly, nothing you will say will change mine. Hence this is just a call for those readers who already support Joe Biden to make sure they vote and think of how they can help with their money or time in other ways.
I’m with Her
Boaz Barak / May 3, 2016
The republican electorate, in their infinite wisdom, have just (essentially) finalized the election of Donald Trump as their nominee for the position of the president of the United States.
While I have my political views, I don’t consider myself a very political person, and have not (as far as I remember) ever written about politics in this blog. However, extreme times call for extreme measures. It is tempting to be fatalist or cynical about this process, and believe that whether Donald Trump or Hillary Clinton is elected doesn’t make much of a difference. Some people believe that the presidency shapes the person more than the other way around, and others feel that all politicians are anyway corrupt or that Hillary is not much better than trump since she voted for the Iraq war and gave paid speeches for Wall Street firms. I think the last two presidencies of George W. Bush and Barack Obama demonstrated that the identity of the president makes a huge difference. All evidence points out that with Trump this difference will be entirely in the negative direction.
While his chances might not be great, this is not a bet I’m comfortable taking. I plan to support Hillary Clinton as much as I can, and hope that other American readers of this blog will do the same
One of the fascinating lines of research in recent years has been a convergence between the statistical physics and theoretical computer science points of view on optimization problems. ` This blog post is mainly a note to myself (i.e., I’m the “dummy” 😃), trying to work out some basic facts in some of this line of work. it was inspired by this excellent talk of Eliran Subag, itself part of a great Simons institute workshop which I am still planning to watch the talks of. I am posting this in case it’s useful for others, but this is quite rough, missing many references, and I imagine I have both math mistakes as well as inaccuracies in how I refer to the literature – would be grateful for comments!
Screen shot from Eliran Subag’s talk demonstrating the difference between “easy” and “hard” instances.
In computer science, optimization is the task of finding an assignment that minimizes some function . In statistical physics we think of as the states of particles, and as the energy of this state. Finding the minimum assignment corresponds to finding the ground state, and another computational problem is sampling from the Gibbs distribution where the probability of is proportional to for some .
Two prototypical examples of such problems are:
Random 3SAT – in this case and is the number of clauses violated by the assignment for a random formula.
Sherrington-Kirpatrick model – in this case and where are independent normal variables with variance for and variance for . (Another way to say it is that is the matrix where ‘s entries are chosen i.i.d from .)
The physics and CS intuition is that these two problems have very different computational properties. For random 3SAT (of the appropriate density), it is believed that the set of solutions is “shattered” in the sense that it is partitioned to exponentially many clusters, separated from one another by large distance. It is conjectured that in this setting the problem will be computationally hard. Similarly from the statistical physics point of view, it is conjectured that if we were to start with the uniform distribution (i.e., a “hot” system) and “lower the temperature” (increase ) at a rate that is not exponentially slow then we will get “stuck” at a “metastable” state. This is analogous how when we heat up sand and then cool it quickly then rather than returning to its original state, the sand will get stuck at the metastable state of glass.
In contrast for the Sherrington-Kirpatrick (SK) model, the geometry is more subtle, but interestingly this enables better algorithms. The SK model is extermely widely studied, with hundreds of papers, and was the inspiration for the simulated annealing algorithm. If memory serves me right, Sherrington and Kirpatrick made the wrong conjecture on the energy of the ground state, and then Parisi came up in 1979 with a wonderful and hugely influential way to compute this value. Parisi’s calculation was heuristic, but about 30 years later, first Talagrand and later Panchenko proved rigorously many of Parisi’s conjectures. (See this survey of Panchenko.)
Recently Montanari gave a polynomial time algorithm to find a state with energy that is arbitrarily close to the ground state’s. The algorithm relies on Parisi’s framework and in particular on the fact that the solution space has a property known as “full replica symmetry breaking (RSB)” / “ultrametricity”. Parisi’s derivations (and hence also Montanari’s analysis) are highly elaborate and I admit that I have not yet been able to fully follow it. The nice thing is that (as we’ll see) it is possible to describe at least some of the algorithmic results without going into this theory. In the end of the post I will discuss a bit some of the relation to this theory, which is the underlying inspiration for Subag’s results described here.
Note: These papers and this blog post deal with the search problem of finding a solution that minimizes the objective. The refutation problem of certifying that this minimum is at least for some has often been studied. The computational complexity of these problems need not be identical. In particular there are cases where the search problem has an efficient algorithm achieving value but the best refutation algorithm can only certify that the value is at most for .
Analysis of a simpler setting
Luckily, there is a similar computational problem, for which the analysis of analogous algorithm, which was discovered by Subag and was the partial inspiration for Montanari’s work, is much simpler. Specifically, we consider the case where is an element of the unit sphere, and is a degree polynomial with random Gaussian coefficients. Specifically, for every vector , we let where for every , is a random tensor of order whose coefficients are all chosen i.i.d in . (We assume that polynomial does not have constant or linear components.)
Depending on , the computational and geometrical properties of this problem can vary considerably. The case that (i.e., only has a non-zero coeffiecent) corresponds to finding the unit vector minimizing for a random matrix , which of course corresponds to the efficiently solveable minimum eigenvector problem. In contrast, the case corresponds to finding a rank one component of a random three-tensor, which is believed to be computationally difficult. The Parisi calculations give a precise condition on the vector such that if holds then the solution space has the “full RSB” property (and hence the problem is computationally easy) and if does not hold then the solution space does not have this property (and potentially the problem is hard).
These calculations also give rise to the following theorem:
Theorem (Chen and Sen, Proposition 2): If holds then in the limit , , where . (That is, is the second derivative of this univariate polynomial)
We will not discuss the proof of this theorem, but rather how, taking it as a black box, it leads to an algorithm for minimizing that achieves a near-optimal value (assuming holds).
It is a nice exercise to show that for every two vectors , . Hence for any unit vector , is a random variable with mean zero and standard deviation . Since (after some coarsening) the number of unit vectors of dimension can be thought of as for some , and we expect the probability of deviating standard deviations to be , the minimum value of should be for some constant . However determining this constant is non trivial and is the result of the Parisi theory.
To get a better sense for the quantity , let’s consider two simple cases:
If (i.e., for random matrix ) then and , meaning that . This turns out to be the actual minimum value. Indeed in this case . But the matrix ‘s non diagonal entries are distributed like and the diagonal entries like which means that is distributed as times a random matrix from the Gaussian Orthogonal Ensemble (GOE) where for off diagonal entries and for diagonal entries. The minimum eigenvalue of such matrices is known to be with high probability.
If (i.e. for a random -tensor ) then does not hold. Indeed, in this case the value for large . However I believe (though didn’t find the reference) that the actual minimum tends to with .
While the particular form of the property is not important for this post, there are several equivalent ways to state it, see Proposition 1 in Subag’s paper. One of them is that the function (note the negative exponent) is concave on the interval . It can be shown that this condition cannot be satisfied if , and that for every setting of , if is large enough then it will be satisfied.
The central result of Subag’s paper is the following:
Theorem: For every , there is a polynomial-time algorithm such that on input random chosen according to the distribution above, with high probability such that .
The algorithm itself, and the idea behind the analysis are quite simple. In some sense it’s the second algorithm you would think of (or at least the second algorithm according to some ordering).
The first algorithm one would think of is gradient descent. We start at some initial point , and repeat the transformation for some small (and normalizing the norm). Unfortunately, we will generally run into saddle points when we do so, with the gradient being zero. In fact, for simplicity, below we will always make the pessimistic assumption that we are constantly on a saddle point. (This assumption turns out to be true in the actual algorithm, and if it was not then we can always use gradient descent until we hit a saddle.)
The second algorithm one could think of would be to use the Hessian instead of the gradient. That is, repeat the transformation where is the minimal eigenvector of (i.e., the Hessian matrix such that ). By the Taylor approximation (and since we assume the gradient is zero) the change in the objective will be roughly . (Because we assume the gradient vanishes, it will not make a difference whether we update with or , but we use for consistency with gradient descent.)
The above approach is promising, but we still need some control over the norm. The way that Subag handles this is that he starts with , and at each step takes a step in an orthogonal direction, and so within steps he will get to a unit norm vector. That is, the algorithm is as follows:
Algorithm:
Input:.
Goal: Find unit approximately minimizing
Initialize
For : i. Let be a unit vector such that is orthogonal to and . (Since the bottom eigenspace of has large dimention, we can find a vector that is not only nearly minimal eigenvector but also orthogonal to all prior ones. Also, as mentioned, we assume that .) ii. Set .
Output
(The fact that the number of steps is and not is absolutely crucial for the algorithm’s success; without it we would not have been able to use the second order contribution that arise from the Hessian.)
If we define to be the minimum eigenvalue at time , we get that the final objective value achieved by the algorithm satisfies
Now due to rotation invariance, the distribution of at the point is the same as at the point . Using concentration of measure arguments, it can be shown that the minimum eigenvalue of will be close with high probability to the minimum eigenvalue of . Since we can write
where is the minimum eigenvalue of at the point . Taking to zero, we get that (approximately) the value of the solution output by the algorithm will satisfy
and hence the result will be completed by showing that
To do this, let’s recall the definition of :
where for every , is a random tensor of order whose coefficients are all chosen i.i.d in .
For simplicity, let’s assume that and hence
(The calculations in the general case are similar)
The entry of equals . The contribution of the component to this term only arises from the terms corresponding to either or and hence for it equals which is distributed like . For , since , the contributioon equals which is distributed like .
The contribution from the component comes (in the case ) from all the terms involving that is, which is distributed like . For the contribution will be from the terms involving , with each yielding a contribution of , and hence the result will be distributed like .
(More generally, for larger , the number of terms for distinct is , each contributing a Gaussian of standard deviation , while for we have terms, each contributing a Gaussian of standard deviation .)
Since the sum of Gaussians is a Gaussian we get that is distributed like a Gaussian with variance for , and twice that for . This means that the minimum eigenvalue of equals times the minimum eigenvalue of a random matrix from the Gaussian Orthogonal Ensemble (i.e. is sampled via , ). As mentioned above, it is known that this minimum eigenvalue is , and in fact by the semi-circle law, for every , the number of eigenvalues of value is , and so we can also pick one that is orthogonal to the previous directions. QED
Full replica symmetry breaking and ultra-metricity
The point of this blog post is that at least in the “mixed spin” case considered by Subag, we can understand what the algorithm does and the value that it achieves without needing to go into the theory of the geometry of the solution space, but let me briefly discuss some of this theory. (As I mentioned, I am still reading through this, and so this part should be read with big grains of salt.)
The key object studied in this line of work is the probability distribution of the dot product for and sampled independently from the Gibbs distribution induced by . (This probability distribution will depend on the number of dimensions , but we consider the case that .)
Intuitively, there are several ways this probability distribution can behave, depending on how the solution space is “clustered”:
If all solutions are inside a single “cluster”, in the sense that they are all of the form where is the “center” of the cluster and is some random vector, then will be concentrated on the point .
If the solutions are inside a finite number of clusters, with centers , then the support of the distribution will be on the points .
Suppose that the solutions are inside a hierarchy of clusters. That is, suppose we have some rooted tree (e.g., think of a depth full binary tree), and we associate a vector with every vertex of , with the property that is orthogonal to all vectors associated with ‘s ancestors on the tree. Now imagine that the distribution is obtained by taking a random leaf of and outputting for all vertices on the path from the root to . In such a case the dot product of and will be taken over all the common ancestors of . As the dimension and depth of the tree goes to infinity, the distribution over dot product can have continuous support, and it is this setting (specifically when the support is an interval ) that is known as full replica symmetry breaking. Because the dot product is determined by common ancestor, for every three vectors in the support of the distribution or . It is this condition that known as ultra-metricity.
In Subag’s algorithm, as mentioned above, at any given step we could make an update of either or . If we think of all the possible choices for the signs in the of the algorithms, we see that the algorithm does not only produce a single vector but actually such vectors that are arranged in an ultrametric tree just as above. Indeed, this ultrametric structure was the inspiration for the algorithm and is the reason why the algorithm produces the correct result precisely in the full replica symmetry breaking regime.
Acknowledgements: Thanks to Tselil Schramm for helpful comments.
A central puzzle of deep learning is the question of generalization. In other words, what can we deduce from the training performance of a neural network about its test performance on fresh unseen examples. An influential paper of Zhang, Bengio, Hardt, Recht, and Vinyals showed that the answer could be “nothing at all.”
Zhang et al. gave examples where modern deep neural networks achieve 100% accuracy on classifying their training data, but their performance on unseen data may be no better than chance. Therefore we cannot give meaningful guarantees for deep learning using traditional “generalization bounds” that bound the difference between test and train performance by some quantity that tends to zero as the number of datapoints increases. This is why (to quote their title), Zhang et al. claimed that “understanding deep learning requires rethinking generalization”.
But what if the issue isn’t that we’ve been doing generalization bounds wrong, but rather that we’ve been doing deep learning (or more accurately, supervised deep learning) wrong?
Self Supervised + Simple fit (SSS) learning
To explain what we mean, let’s take a small detour to contrast “traditional” or “end-to-end” supervised learning with a different approach to supervised learning, which we’ll call here “Self-Supervised + Simple fit” or “SSS algorithms.” (While the name “SSS algorithms” is new, the approach itself has a long history and has recently been used with great success in practice; our work gives no new methods—only new analysis.)
The classical or “end-to-end” approach for supervised learning can be phrased as “ask and you shall receive”. Given labeled data, you ask (i.e., run an optimizer) for a complex classifier (e.g., a deep neural net) that fits the data (i.e., outputs the given labels on the given data points) and hope that it will be successful on future, unseen, data points as well. End-to-end supervised learning achieves state-of-art results for many classification problems, particularly for computer vision datasets ImageNet and CIFAR-10.
However, end-to-end learning does not directly correspond to the way humans learn to recognize objects (see also this talk of LeCun). A baby may see millions of images in the first year of her life, but most of them do not come with explicit labels. After seeing those images, a baby can make future classifications using very few labeled examples. For example, it might be enough to show her once what is a dog and what is a cat for her to correctly classify future dogs and cats, even if they look quite different from these examples.
Figure 1: Cartoon of end-to-end vs SSS learning
In recent years, practitioners have proposed algorithms that are more similar to human learning than supervised learning. Such methods separate the process into two stages. In the first stage, we do representation learning whereby we use unlabeled data to learn a representation: a complex map (e.g., a deep neural net) mapping the inputs into some “representation space.” In the second stage, we fit a simple classifier (e.g., a linear threshold function) to the representation of the datapoints and the given labels. We call such algorithms “Self-Supervision + Simple fit” or SSS algorithms. (Note that, unlike other representation-learning based classifiers, the complex representation is “frozen” and not “fine-tuned” in the second stage, where only a simple classifier is used on top of it.)
While we don’t have a formal definition, a “good representation” should make downstream tasks easier, in the sense of allowing for fewer examples or simpler classifiers. We typically learn a representation via self supervision , whereby one finds a representation minimizing an objective function that intuitively requires some “insight” into the data. Approaches for self-supervision include reconstruction, where the objective involves recovering data points from partial information (e.g., recover missing words or pixels), and contrastive learning, where the objective is to find a representation that make similar points close and dissimilar points far (e.g., in Euclidean space).
SSS algorithms have been traditionally used in natural language processing, where unlabeled data is plentiful, but labeled data for a particular task is often scarce. But recently SSS algorithms were also used with great success even for vision tasks such as ImageNet and CIFAR10 where all data is labeled! While SSS algorithms do not yet beat the state-of-art supervised learning algorithms, they do get pretty close. SSS algorithms also have other practical advantages over “end-to-end supervised learning”: they can make use of unlabeled data, the representation could be useful for non-classification tasks, and may have improved out of distribution performance. There has also been recent theoretical analysis of contrastive and reconstruction learning under certain statistical assumptions (see Arora et al and Lee et al).
The generalization gap of SSS algorithms
In a recent paper, we show that SSS algorithms not only work in practice, but work in theory too.
Specifically, we show that such algorithms have (1) small generalization gap and (2) we can prove (under reasonable assumptions) that their generalization gap tends to zero with the number of samples, with bounds that are meaningful for many modern classifiers on the CIFAR-10 and ImageNet datasets. We consider the setting where all data is labeled, and the same dataset is used for both learning the representation and fitting a simple classifier. The resulting classifier includes the overparameterized representation, and so we cannot simply apply “off the shelf” generalization bounds. Indeed, a priori it’s not at all clear that the generalization gap for SSS algorithms should be small.
To get some intuition for the generalization gap of SSS algorithms, consider the experiment where we inject some label noise into our distribution. That is, we corrupt an fraction of the labels in both the train and test set, replacing them with random labels. Already in the noiseless case (), the generalization gap of SSS algorithms is noticeably smaller than that of end-to-end supervised learning. As we increase the noise, the difference becomes starker. End-to-end supervised learning algorithms can always achieve 100% training accuracy, even as the test accuracy deteriorates, since they can “memorize” all the training labels they are given. In contrast, for SSS algorithms, both training and testing accuracy decrease together as we increase the noise, with training accuracy correlating with test performance.
Figure 2: Generalization gap of end-to-end and SSS algorithms on CIFAR 10 as a function of noise (since there are 10 classes, 90% noisy samples corresponds to the Zhang et al experiment). See also interactive version.
Our main theoretical result is a formal proof of the above statement. To do so, we consider training with a small amount of label noise (say ) and define the following quantities:
The robustness gap is the amount by which training accuracy degrades between the “clean” () experiment and the noisy one. (In this and all other quantities, the training accuracy is measured with respect to the original uncorrupted labels.)
The memorization gap considers the noisy experiment () and measures the amount by which performance on the corrupted data samples (where we received the wrong label) is worse than performance on the overall training set. If the algorithm can memorize all given labels, it will be perfectly wrong on the corrupted data samples, leading to a large memorization gap.
The rationality gap is the difference between the performance on the corrupted data samples and performance on unseen test examples. For example, if is an image of a dog, then it measures the difference between the probability that when is in the training set and the probability that when is not in the training set at all. Since intuitively, getting the wrong label should be worse than getting no label at all, we typically expect the rationality gap to be around zero or negative. Formally we define the rationality gap to the maximum between and the difference above, so it is always non-negative. We think of an algorithm with a significant positive rationality gap as “irrational.”
By summing up the quantities above, we get the following inequality, which we call the RRM bound
generalization gaprobustness gaprationality gapmemorization gap
In practice, the robustness and rationality gaps are always small, both for end-to-end supervised algorithms (which have a large generalization gap), and for SSS algorithms (which have a small generalization gap). Thus the main contribution to the generalization gap comes from the memorization gap. Roughly speaking, our main result is the following:
If the complexity of the second-stage classifier of an SSS algorithm is smaller than the number of samples then the generalization gap is small.
See the paper for the precise definition of “complexity,” but it is bounded by the number of bits that it takes to describe the simple classifier (no matter how complex is the representation used in the first stage). Our bound yields non-vacuous results in various practical settings; see the figures below or their interactive version.
Figure 3: Empirical study of the generalization gap of a variety of of SSS algorithms on CIFAR-10. Each vertical line corresponds to one model, sorted by generalization gap. The RRM bound is typically near-tight, and our complexity upper bound is often non vacuous. Use this webpage to interact with figures 3 and 4.Figure 4: Empirical study of gaps for the ImageNet dataset. Because of limited computational resources, we only evaluated the theoretical bound for two models in this dataset.
What’s next
There are still many open questions. Can we prove rigorous bounds on robustness and rationality? We have some preliminary results in the paper, but there is much room for improvement. Similarly, our complexity-based upper bound is far from tight at the moment, though the RRM bound itself is often surprisingly tight. Our work only applies to SSS algorithms, but people have the intuition that even end-to-end supervised learning algorithms implicitly learn a representation. So perhaps these tools can apply to such algorithms as well. As mentioned, we don’t yet have formal definitions for “good representations,” and the choice of the self-supervision task is still somewhat of a “black art” – can we find a more principled approach?
TL;DR: Know of a great recent paper that should be highlighted to the theory community and beyond? Email a nomination to sigact.highlights.nominations@outlook.com by October 19th.
The goal of the SIGACT Research Highlights Committee is to help promote top computer science theory research via identifying results that are of high quality and broad appeal to the general computer science audience. These results would then be recommended for consideration for the CACMResearch Highlights section as well as other general-audience computer science research outlets.
Nomination and Selection Process:
The committee solicits two types of nominations:
1) Conference nominations. Each year, the committee will ask the PC chairs of theoretical computer science conferences to send a selection of up to three top papers from these conferences (selected based on both their technical merit and breadth of interest to non-theory audience) and forwarding them to the committee for considerations.
2) Community nominations. The committee will accept nominations from the members of the community. Each such nomination should summarize the contribution of the nominated paper and also argue why this paper is suitable for broader outreach. The nomination should be no more than a page in length and can be submitted at any time by emailing it to sigact.highlights.nominations@outlook.com. Self-nominations are discouraged.
The nomination deadline is Monday, October 19, 2020 .
Committee:
The SIGACT Research Highlights Committee currently comprises the following members:
Boaz Barak, Harvard University Irit Dinur, Weizmann Institute of Science Aleksander Mądry, Massachusetts Institute of Technology (chair) Jelani Nelson, University of California, Berkeley
The Highlights of Algorithms conference is a forum for presenting the highlights of recent developments in algorithms and for discussing potential further advances in this area. The conference will provide a broad picture of the latest research in algorithms through a series of invited talks, as well as short talks. Invited talks includes top algorithmic surveys and papers from FOCS/STOC/SODA/COLT/PODC/SPAA/ITCS/PODS (2019-20)
When I was in grad school a common advice for beginning grad students was to leaf through the (paper) STOC or FOCS proceedings to see papers that you are interested in. This is still a decent advice (and requires less physical strength these days 🙂 ) but papers are not always the best source for people starting out. More often than we’d like to, the author of the 10th paper on a topic writes it to the audience of people that read (in fact probably wrote) the previous 9 papers.
Talks often do a better job of giving an overview of the field, and one great resource is the videos from the Simons Institute. If you want to get more in-depth information about a particular topic, it’s hard to beat the extended surveys in Foundations and Trends in TCS, as well as the related areas such as Machine Learning and Information Theory.
But if you are not yet sure what topic you’re interested in, or perhaps not even sure if you want to go to grad school, but you just know that you are interested in theory, there is now a new great resource. As I learned from Twitter, Ryan O’Donnell has just finished his TCS Toolkit course. All 99(!) lectures are on YouTube.
The topics are the following (these links are to the handwritten notes, for the lecture videos see YouTube channel):
Symposium on Simplicity in Algorithms (SOSA) is a conference in theoretical computer science dedicated to advancing algorithms research by promoting simplicity and elegance in the design and analysis of algorithms. The benefits of simplicity are manifold: simpler algorithms manifest a better understanding of the problem at hand; they are more likely to be implemented and trusted by practitioners; they can serve as benchmarks, as an initialization step, or as the basis for a “state of the art” algorithm; they are more easily taught and are more likely to be included in algorithms textbooks; and they attract a broader set of researchers to difficult algorithmic problems.
Papers in all areas of algorithms research are sought. An ideal submission will advance our understanding of an algorithmic problem by, for example, introducing a simpler algorithm, presenting a simpler analysis of an existing algorithm, or offering insights that generally simplify our understanding of important algorithms or computational problems.
We are especially interested in papers that make material more accessible to a wider audience, such as undergraduates, or for more specialized topics, general algorithms researchers.
Submissions should contain novel ideas or attractive insights, but they are not expected to prove novel theorems. That is, the results themselves can be known, but their presentation must be new. Conference website is https://www.siam.org/conferences/cm/conference/sosa21
Program Committee Aaron Bernstein, Rutgers University, U.S. Allan Borodin, University of Toronto, Canada Timothy Chan, University of Illinois at Urbana-Champaign, U.S. Edith Cohen, Google Research, U.S. and Tel Aviv University, Israel Vincent Cohen-Addad, Google Research, Zürich, Switzerland Sanjoy Dasgupta, University of California, San Diego, U.S. Michael Elkin, Ben-Gurion University of the Negev, Israel Funda Ergun, NSF and University of Indiana, Bloomington, U.S. Mike Fellows, University of Bergen, Norway Arnold Filtser, Columbia University, U.S. Kasper Green Larsen, Aarhus University, Denmark Andrew Goldberg, Amazon, U.S. John Iacono, Université libre de Bruxelles, Belgium Russell Impagliazzo, University of California, San Diego, U.S. Giuseppe Italiano, LUISS Guido Carli, Rome, Italy Michael Kapralov, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland Anna Karlin, University of Washington, Seattle, U.S. Valerie King, University of Victoria, Canada (Co-chair) Hung Le, University of Victoria, Canada and University of Massachusetts at Amherst, U.S. (Co-chair) Daniel Lokshtanov, University of California, Santa Barbara, U.S. S. Cenk Sahinalp, NIH and University of Indiana, Bloomington, U.S. Jared Saia, University of New Mexico, U.S. Shay Solomon, Tel Aviv University, Israel Mario Szegedy, Alibaba and Rutgers University, U.S. Robert Tarjan, Princeton University, U.S. Seeun William Umboh, University of Sydney, Australia Qin Zhang, University of Indiana, Bloomington, U.S. Uri Zwick, Tel Aviv University, Israel
Steering Committee Michael A. Bender, Stony Brook University, U.S. David Karger, Massachusetts Institute of Technology, U.S. Tsvi Kopelowitz, Bar-Ilan University, Israel Seth Pettie, University of Michigan, U.S. Robert Tarjan, Princeton University, U.S. Mikkel Thorup, University of Copenhagen, Denmark
Submissions and Deadlines: A link to the submission site is available https://easychair.org/my/conference?conf=sosa21. Short Abstract Submission and Paper Registration Deadline: August 12, 2020, 4:59 PM Eastern Time Full Paper Submission Deadline: August 19, 2020, 4:59 PM Eastern Time Acceptance Notification: early October 2020 Proceedings (posted online): early January 2021
How to Participate Authors must submit their papers electronically, in PDF format. Submissions should begin with a title page containing the paper title, each author’s name, affiliation, and email address, and an abstract summarizing the contributions of the paper. There is no page limit. The paper should begin with a clear description of the algorithmic problem to be solved, a survey of prior work on the problem—including a candid assessment of prior work in terms of simplicity and elegance—and a discussion of the contributions of the paper. The body of the paper should be written for a general theoretical computer science audience, and substantiate the main claims of the paper with full proofs. The submission should be typeset using 11 point font, in a single-column format with ample spacing throughout and ample margins all around. The submissions ought to be visually easy to read. Brevity is a hallmark of simplicity. Authors are specifically encouraged to submit short and simple papers. Final papers may not exceed fifteen (15) pages in double column format.
The program committee may designate one or more papers as SOSA Best Papers. All submissions will be considered.
The Simons Institute is organizing a series of lectures on Advances in Boolean Function Analysis, that will highlight a few major developments in the area. The series will feature weekly two-hour lectures from July 15th to Aug 18th. The lectures aim to address both the broad context of the results and their technical details. Though closely related in theme, each lecture will be self-contained. The schedule is attached below (more info at link).
Talks take place on Wednesdays at 10am Pacific time (1pm Eastern). If you can’t catch them live, they will be redcorded.
I originally planned this summer to finish the work on my Introduction to Theoretical Computer Science book, and in particular write the two missing chapters on space complexity and interactive proof systems. Needless to say, this summer did not go as planned and I won’t be able to write these chapters. However, I still intend to go over the existing chapters, fixing typos, adding examples, exercises, and generally making it friendlier to beginning undergraduate students.
Toward this end, I would be grateful for people posting bugs, typos, and suggestions as GitHub issues (I currently have 267 closed and 14 open issues which I hope to get to soon). Of course, if you are technically inclined and there’s a simple local fix, you can also make a pull request.
Aside from these fixes, I am making two more “global” changes to the book. First, I am adding a “non mathy overview” for each chapter. While some students got a lot from reading the book prior to lectures, others were intimidated by the mathematical notation, and so I hope this more gentle introduction will be helpful. I am also adding more examples & solved exercises toward this end.
Another change is that I now follow the more traditional way of presenting deterministic finite automata before Turing machines – DFAs are still optional and can be skipped without missing anything, but some instructors find them as a good introduction to Turing Machines. Thus the order of presentation of materials in the book is roughly as follows:
Introduction, representing objects as strings – Representing numbers, lists, etc. Specifying computational tasks as functions mapping binary strings to binary strings, Cantor’s theorem.
Finite functions and Boolean circuits – Every function can be computed by some circuit, circuits as straightline programs, representing circuits as strings, universal circuit evaluator, counting lower bound.
Computing on unbounded inputs – DFAs (optional), Turing Machines, equivalence between Turing machines, RAM machines and programming languages, λ calculus (optional), cellular automata (optional)
Efficient computation – Modeling running time, time hierarchy theorem, P and EXP
NP and NP completeness – Polynomial-time reductions, Cook-Levin Theorem (using circuits), definition of NP using “proof system”/”verifying algorithms” (no non-deterministic TMs), NP completeness, consequences of P=NP: search to decision, optimal machine learning, etc..
Randomized computation: Worst-case randomized computation, defining BPP, Sipser-Gács, does BPP=P? (a little on derandomization)
Cryptography: One time pad, necessity of long keys for information theoretic crypto, pseudorandom generators and stream ciphers, taste of public key and “magic” (ZKP, FHE, MPC)
Quantum computing: Some quantum mechanics background – double slit experiment, Bell’s inequality. Modeling quantum computation. Bird’s eye view of Shor’s algorithm and quantum Fourier transform.