The FOCS 2014 call for papers is out. The PC chose to forgo hard page limits for submissions, trusting authors to apply their best judgment to make their submissions as easy to read and review as possible. Toward this end, here are some words of advice for potential authors. None of this is mandatory, but I hope you keep it in mind as you prepare your submission. I believe the effort to make your paper more readable will pay off not just in improving your FOCS submission, but also the future online, proceedings, and journal versions of your paper.
Remember the audience.
One of the challenging aspects of writing a FOCS submission (and in fact any scientific paper) is that it needs to address different types of readers simultaneously. A non-exhaustive list includes:
1) The expert in your field that wants to verify all the details and understand how your approach differs from their 2007 paper.
2) A non-expert reviewer that wants to understand what you did, why the question is motivated, and get some sense of the techniques you used.
3) A PC member (such as yours truly) that was not assigned to review the paper, and wants to get some approximation of the above by just skimming it for a few minutes.
A general rule of thumb for addressing those different readers is to make sure that the paper’s first few sections are accessible to a general theoretical CS audience, while later sections might contain the details that are mainly of interest to experts in the field. This brings us to our next point..
Put your best foot forward.
While there is no hard page limit, FOCS reviewers are not expected to read all submissions in full. In practice, this means you should follow what I call “Impagliazzo’s Rule”: The first X pages of the paper should make the reader want to read the next X pages, for any value of X.
In particular, you should make sure that your results, your techniques, the motivation for your work, its context and novelty compared to prior works are clearly stated early in the paper. If your main theorem is hard to state succinctly, you can state an informal version, or an important special case, adding a forward reference to the place where it’s stated in full.
The above applies not just to results but also to the techniques as well. Don’t wait until the technical section to tell us about your novel ideas. Some of the best written papers follow the introduction with a section such as “Our techniques”, “Proof outline”, “Warmup” or “Toy problem” that illustrates the ideas behind the proofs in an informal and accessible way.
While modesty is a fine virtue, you don’t want to overdo it in a FOCS submission, and hide your contributions in some remote corners of the papers. Of course, you don’t want to go too far in the other direction, and so you should also
Put your worst foot forward.
As scientists, we bend over backwards to show the potential flaws, caveats, and rooms for improvements in our work, and I expect nothing less from FOCS authors. It can be extremely frustrating for a reviewer to find out that the result is restricted in a significant way only when she reaches Section 4 of the paper. All restrictions, caveats, assumptions, and limitations should be described early in the paper. In fact, some caveats are so major that you shouldn’t wait to state them even until the introduction. For example, if you prove a lower bound that holds only for monotone circuits, then not only should this be clearly stated in the abstract, the word “monotone” should probably appear in the title. Generally speaking, if you’ve made a choice in modeling the problem that makes it easier, you should discuss this and explain what would have changed had you made a different choice. Similarly, any relations and overlap with prior works should be clearly described early in the paper. If the result is a generalization of prior work, explain how they differ and what motivates the generalization. If it improves in some parameters but is worse in others, a discussion of the significance of these is in order. If there is a related work you are aware of, even if it’s not yet formally published, or was done after your work, you should still cite it and explain the relation between the two works and the chronology.
Kill the suspense.
A scientific paper is not a novel and, ideally, readers should not be staying in suspense or be surprised negatively or positively. The FOCS PC is an incredibly talented group of people, but you should still write your paper in a “foolproof” way, trying to anticipate all questions and misunderstandings that a reader may have (especially one that needs to review 40 papers under time pressure).
For example, it can be extremely annoying for an author to get a review saying “the proof of the main theorem can be vastly simplified by using X” where X is the thing you tried first and doesn’t work. The way to avoid it is to add a section titled “First attempt” where you discuss X and explain why it fails. Similarly, if there is a paper that at first look seems related to your work, but turns out to be irrelevant, then you should still cite it and explain why it’s not actually related.
Another annoyance is when the reviews give the impression that the paper was rejected for being “too simple”. I and the rest of the FOCS PC believe that simplicity is a great virtue and never a cause for rejection. But you don’t want the reviewer to be surprised by the simplicity, discovering only late in the paper that the proof is a 3 line reduction to some prior work. If the proof is simple then be proud of this fact, and announce it right from the start. Similarly, if the proof of a particular lemma involves some routine applications of a standard method, you don’t need to remove it or move it to the appendix, but do add a sentence saying this at the proof’s start, so the less detail-oriented reviewers will know to skip ahead. This applies in the other case as well: if the proof involves a novel twist or a subtle point, you should add a sentence alerting the reader to look out for that point.
Writing a scientific paper is often a hard task, and I and the rest of the PC deeply appreciate your decision to send us your work and make the effort to communicate it as clearly as possible. We hope you find the above suggestions useful, and are looking forward to reading your submission.
In this series of three posts I want to discuss some recent and old advances in discrepancy theory and their applications to algorithms. Discrepancy minimization is quite a rich and beautiful area as evidenced in these two books. Here I will focus on a specific perspective — that of “Beating the Union Bound” — which while being limiting captures the main issues in the examples and applications we consider. The description below is particularly efficient in hiding the main combinatorial motivations for the questions in discrepancy theory, but hopefully we will makeup for it by other means.
As a preview of what’s to come, in the next post I will discuss Gluskin’s geometric approach to proving the `Six Standard Deviations Suffice’ theorem (Theorem 1 below, but with a constant different from ) and in the third post we will look at some algorithmic applications and approaches. Much of the content in the posts is based on discussions with Shachar Lovett and Oded Regev, but all mistakes are mine.
We all know what the union bound is. But if you don’t, here’s how the well-known mathematical genius S. Holmes described it: “When you have eliminated the impossible, whatever remains, however improbable, must be the truth”. In non-sleuth terms this says that if you have arbitrary events over some probability space, then
In particular, if the latter sum is less than , then there exists a sample point where none of the events occur.
By now, we have seen several strangely simple and powerful applications of this simple precept – e.g., existence of good Ramsey graphs (Erdős), existence of good error correcting codes (Shannon), existence of good metric embeddings (Johnson-Lindenstrauss). And the union bound and further variants of it are one of the main techniques we have for showing existential results.
However, the union bound is indeed quite naive in many important contexts and is not always enough to get what we want (as nothing seems to be these days). One particularly striking example of this is the Lovász local lemma (LLL). But let us not digress along this (beautiful) path. Here we will discuss how “beating” the union bound can lead to some important results in the context of discrepancy theory.
Discrepancy and Beating the Union Bound: Six Suffice A basic probabilistic inequality which we use day-in and day-out is the Chernoff bound. The special case of interest to us is the following: for any vector and ,
In particular, if we choose , then the right hand side above is less than and we get that there exists such that for every , . Can we do better? It is important to note that the above argument is actually tight for uniformly random signs and we want to do better by a careful choice of signs. This is exactly what Spencer showed in his seminal “Six Standard Deviatons Suffice” result (1985) (Spencer’s proof as well as all other proofs easily generalize to the case when there are more vectors than the degrees of freedom and we only require each vector to have bounded entries.):
We will see a proof of the theorem in the next post.
Discrepancy and Beating the Union Bound (and some): Paving Conjecture As discussed in this post a few months back, the stunning result (adjective is mine) of Adam Marcus, Daniel Spielman and Nikhil Srivastava proving the paving conjecture (and hence resolving the Kadison-Singer problem) can also be cast in a discrepancy language. Let me repeat this from Nikhil’s post for completeness. Let us look at a specific variant of the ‘Matrix Chernoff Bound’ (see Nikhil’s post for references; here denotes the spectral norm of a matrix):
Theorem 2 Given symmetric matrices ,
Note that the above is a substantial generalization of Equation 2 which corresponds to the special case when the matrices ‘s are diagonal matrices with entries given by the vectors . In a very naive but still meaningful way, one can view the factor on the right hand side as again appearing partially because of a union bound.
Just as for the scalar case, the factor is necessary in general for uniformly random signs. Can we do better? There are two facets to this question both of which seem to be quite important and basic (and perhaps correspondingly hard):
- Question 1: Can the factor be improved by picking the signs carefully instead of uniformly at random?
- Question 2: Can the factor be improved for uniformly random signs under some nice conditions on the matrices ‘s?
Here we will focus on the first question, but let me say a couple of words about the second one. In all known examples showing tightness of the matrix Chernoff bound, the matrices seem to have a lot of commutative structure (e.g., diagonal matrices) and intuitively non-commutativity should help us in improving the bound. Perhaps there is a quantitative way of capturing non-commutativity to do better (see the discussion here for one such attempt).
Regarding the first question, Spencer’s theorem gives a positive answer in the special case when ‘s are diagonal matrices with entries (or more generally, bounded entries). The breakthrough result of Marcus, Spielman and Srivastava amounts to giving an exact characterization of when we can do better if the matrices ‘s are rank one positive semi-definite matrices.
Discrepancy and Beating the Union: Matrix Spencer Theorem? The above discussions prompt the following question:
Conjecture For any symmetric matrices with , there exist signs such that .
The above conjecture is a strong generalization of Spencer’s theorem which corresponds to the matrices ‘s being diagonal. The earliest reference I am aware of for it is this paper. I can’t think of any concrete applications of the conjecture, but I quite like it because of the simplicity of the statement. Admittedly, I also have a personal bias: my first foray into discrepancy theory (with Shachar Lovett and Oded Regev) was to study this question.
One can view the conjecture as giving a partial answer to Question 1. In the case when , , so that the right hand side of Equation 3 is . Thus, the above conjecture beats the matrix Chernoff bound by getting rid of the term, but instead introduces a term depending on the “sum-of-norms” of the ‘s as opposed to having a dependence on the “norm-of-the-sum” of ‘s.
In the next post, I will discuss Gluskin’s proof of Theorem 1, which for now (to me) seems to be the most promising approach for proving the conjecture.
Thanks to Shachar Lovett, Oded Regev and Nikhil Srivastava for helpful suggestions, comments, and corrections during the preparation of this post.
Today, we have a guest post from Frank McSherry talking about a clever approach to using Differential Privacy for handling pesky dependencies that get in the way of proving measure concentration results.
In this post I’ll explain a cute use of differential privacy as a tool in probabilistic analysis. This is a great example of differential privacy being useful for something other than privacy itself, although there are other good examples too. The main problem we’ll be looking at is the analysis of mostly independent random variables, through the lens of a clustering problem I worked on many years ago.
In the problem of clustering data drawn from a Gaussian mixture, you assume that you are provided access to a large volume of data each of which is a sample from one of a few high-dimensional Gaussian distributions. Each of the Gaussian distributions are determined by a mean vector and a co-variance matrix, and each Gaussian has a mixing weight describing their relative fraction of the overall population of samples. There are a few goals you might have from such a collection of data, but the one we are going to look at is the task of clustering the data into parts corresponding to the Gaussian distributions from which they were drawn. Under what conditions on the Gaussians is such a thing possible? [please note: this work is a bit old, and does not reflect state of the art results in the area; rather, we are using it to highlight the super-cool use of differential privacy].
The main problem is that while each coordinate of a Gaussian distribution is concentrated, there are some large number of them, and the proximity of a sample to some mean vector is not particularly great. You end up with bounds that look a bit like
where is the sample, is the mean vector, the ambient dimensionality, and the norm of the covariance matrix. The probability gets determined by thinking really hard and its specific value won’t be particularly important for us here.
Dimitris Achlioptas and I had an algorithm for doing this based on spectral projections: we would find the optimal low-rank subspace determined by the singular value decomposition (the rank taken to be , the number of Gaussians in the mixture) and argue that under some separation conditions involving the means and co-variance matrices, the space spanned by these projections was basically the same as the space spanned by the mean vectors of the Gaussian distributions. This is great because when you project a Gaussian sample, you are projecting its mean plus some noise. As the true mean lies in the target space, it stays where it is. When you project Gaussian noise onto a fixed subspace, it stays Gaussian, but with far fewer dimensions. The particular form of these results looks something like this, with a projection matrix applied to the sample before subtracting from .
where is close to . This means that while stays centered on , the contribution of the noise more-or-less vanishes. At least, the is reduced to and that can be quite a lot. Hooray!
The problem is that this “more-or-less vanishes” thing is really only true when the target space and the random noise are independent. However, since the optimal low-rank subspace was determined from the data, it isn’t independent of any of the noise we are projecting. It’s slightly dependent on the noise, and in the wrong way (it attempts to accomodate the noise, which isn’t what we want if we want to project *out* the noise). In particular, you don’t immediately get access to the sorts of bounds above.
You could do things the way Dimitris and I did, which was a fairly complicated mess of cross-training (randomly partition the data and use each half to determine the subspace for the other), and you end up with a paper that spends most of its time determining algorithms and notation to enforce the independence (the cross-training needs to be done recursively, but we can’t afford to cut the samples in half at each level, blah blah, brain explodes). You can read all about it here. We’re going to do things a bit simpler now.
Enter differential privacy. Recall, for a moment, the informal statement of differential privacy: a randomized computation has differential privacy if the probability of any output occurrence is not substantially adjusted when a single input element is added or removed. What a nice privacy definition!
Now, let’s think of it slightly differently, in terms of dependence and independence. If is the result of a differentially private computation on a dataset , then is not substantially differently distributed than the same computation run on . If this second distribution enjoys some nice property with high probability, for example due to its independence from , then it remains very likely that has the property as well. The probability that the property no longer holds can only increase by a factor of when we add back in to the input.
For example, let’s consider the probability of the property: “the squared length of the projection of onto the optimal subspace is much larger than ”. When the input data are , resulting in a projection-valued random variable we’ll name , this probability is small because is independent of .
When the input data are , resulting in a projection-valued random variable , this probability is not easily bounded by independence, but can be bounded by differential privacy: if the computation producing is -differentially private, then the probability can increase by at most :
We can even live with fairly beefy values of and still get a result here. Let’s take for concreteness.
Now let’s discuss differentially private optimal subspace computation. One standard way to compute optimal low dimensional subspaces is by taking the covariance matrix of the data and computing its top singular vectors, using the singular value decomposition. One standard way to release the covariance matrix while preserving differential privacy is to add Laplace noise proportional to the largest magnitude permitted of any of the data points, to each of the entries of the covariance matrix. Since our Gaussian data are nicely concentrated, they aren’t likely to be terribly enormous, and a data-independent upper bound will work great here.
What we get is a noisy covariance matrix, which we then subject to the singular value decomposition. There are some nice theorems about how the SVD is robust in the presence of noise, which is actually the same reason it is used as a tool to filter out all of that noise that the Gaussian distributions added in in the first place. So, even though we added a bunch of noise to the covariance matrix, we still get a fairly decent approximation to the space spanned by the means of the Gaussian distributions (as long as the number of samples is larger than the dimensionality of the samples). At the same time, because the resulting subspace is differentially private with respect to the samples, we can still use the concentration bounds typically reserved for the projection of random noise on independent subspaces, as long as we can absorb a factor of .
At its heart, this problem was about recovering a tenuous independence which was very important for simplicity (and arguably tractability) of analysis. It shows up in lots of learning problems, especially in validating models: we would typically split data into test and training, to permit an evaluation of learned results without the issue of overfitting. Here differential privacy makes things simpler: if your learning process is differentially private, you did not overfit your data (much).
This week’s post touches on subjects spanning almost 2000 years — we start with a cryptographic problem and go back in time to discover a theorem that could be known to the Greeks. Its content is based on a paper co-authored with Anton Mityagin and Kobbi Nissim that appeared in ANTS VII in 2006. The paper was motivated by a cryptographic question, previously introduced by Claus-Peter Schnorr, but the machinery that we ended up using had more to do with extremal graph theory, projective geometry, and combinatorial number theory. It fits in nicely with the overarching theme of this blog, which is interconnectedness of mathematics and CS theory, and leaves open several intriguing questions at the intersection of these areas.
Consider the problem of computing the discrete logarithm in a generic group of a known prime order : given two random elements and , find so that . Instead of having access to the group itself, we may only manipulate encodings of its elements (basically, a random mapping of the group to a sufficiently large alphabet) via a group oracle. The group oracle accepts encodings of two elements and returns the encoding of their product. Think of it as a model of an abstract group, where the result of multiplying two group elements is treated as a new formal variable.
Let us try solving the discrete logarithm problem in this model. Given the encodings of two elements and , one can multiply them, obtaining the encoding of , square the result, etc. In general, it is possible to compute (encodings of) elements of the form , where are pairs of integers modulo (all arithmetic not involving or is going to be modulo from now on). Of course, there can be multiple ways of arriving at the same element. For instance, (as the group is of the prime order, it is necessarily Abelian). Unless we do it on purpose, all elements that we obtain from the group oracle are going to be distinct with an overwhelming probability over (assume that the group order is large, say, at least ). Indeed, if , then which happens for with probability at most . On the other hand, if we do get a non-trivial relationship, we can recover right away.
In other words, the group oracle keeps outputting some random encodings that tell us nothing useful about the elements and (we could sample encodings from the same distribution ourselves, without access to the oracle), until it returns an element that we did see before, which immediately gives away the answer to the discrete logarithm problem.
If is chosen uniformly at random from , the success probability of any algorithm in the generic group model making no more than group operations is bounded by : each pair of elements output by the group oracle collides with probability at most , there are at most such pairs, union bound, check and mate. A formal version of this handwavy argument is due to Victor Shoup, which gives a tight (up to a constant) bound on the success probability of any algorithm for solving the discrete logarithm problem in the generic group model.
A simple algorithm matches this bound. Let . Compute (by repeat multiplications by ), (by repeat multiplications by ), and using the elements already available, compute . If , there’s going to be a collision between and for some and . This algorithm is known as the baby-step giant-step method — we are making “baby ”steps when we are multiplying by powers of , and “giants” steps, when we are computing powers of . If , the discrete logarithm problem is solved with probability 1.
The above argument suggests that in order to solve the discrete logarithm problem in the generic group model one would want to maximize the probability of observing a collision. Collisions have simple geometric interpretation: each time the algorithm computes , it draws a line in the space. An element is “covered” if two lines intersect above this element: . The adversary is trying to cover as many elements as possible with the fewest number of lines.
As we just have seen, the number of group operations required to solve the discrete logarithm problem in the generic group when and are chosen uniformly at random is . The question becomes much more interesting if we constrain the joint distribution of and .
What is the complexity of the discrete logarithm problem measured as the number of group operations, if , where is sampled uniformly from ?
It turns out that this question has been answered for some simple sets , but it is wide open in general.
II. Geometric Formulation
We re-formulate the problem using the language of finite field geometry.
Given a subset of , define its DL-complexity, denoted as , as the minimal number of lines in whose intersection points projected to the -axis cover .
In the notation of the previous section, the adversary is drawing lines . It scores a hit when two lines intersect above point , i.e., . The adversary’s goal is to cover the entire set with the smallest number of lines, which would correspond to solving the discrete logarithm problem for the case when and .
What are the most basic facts about ?
- . Indeed, we know that the (generic) baby-step giant-step algorithm covers the entire with lines.
- — duh! It suffices to draw a single line and one line for each element of .
- : if lines can cover the entire , then the number of intersection points, which is less than , is at least .
Putting these bounds together on this schematic picture drawn in the log-log scale, we can see that lives inside the shaded triangle.
The most intriguing part of the triangle is the upper-left corner, marked with the target sign, that corresponds to sets that are as small as but have the property that solving the discrete logarithm problem in these subsets is as hard as in the entire . How can we get there, or just get closer? But first, why do we care at all?
One, rather vague motivation is that we are interested in characterizing these subsets because they capture the complexity of the discrete logarithm problem. Another, due to Claus-Peter Schnorr, who defined the problem in 2000, is that the amount of entropy needed to sample an element of that set is half of . The observation that got us going back in 2005 was that modular exponentiation takes amount of time that depends on the exponent. Wouldn’t it be nice if we could choose exponents that allowed for faster exponentiation algorithms? These exponents could cover only a fraction of the entire space, which naturally led us to the question of understanding the discrete logarithm problem restricted to a subset, which turned out to be very interesting in its own right.
The first result, going back to Schnorr, is very encouraging:
For a random of size , with probability at least .
It means a random subset has essentially maximal possible DL-complexity (up to a factor) with very high probability. Unfortunately, using (truly) random subsets forecloses the possibility of extracting any gains in exponentiation relative to the average case. Second, it really does not quite answer the question of whether any specific sets are particularly hard for the discrete logarithm problem.
In the rest of this post we explore several approaches towards constructing explicit sets and sets with succinct representation for which we can prove a lower bound on their DL-complexity stronger than .
III. A first attempt
Rather than trying to solve the problem in full generality, let’s constrain the definition of to capture only generalizations of the baby-step giant-step method. Let us call this restriction , defined as follows:
Given a subset of , let be the minimal number so that is covered by intersection of two sets of lines and , where .
Recall that the intersection of two lines covers an element of if these lines intersect at a point whose first coordinate is in .
The definition of complexity considers only horizontal lines (analogous to the giant steps of the algorithm, ) and parallel slanted lines (corresponding to the elements ). The 1 in BSGS1 refers to the fact that all slanted lines have slope of exactly 1 (for now — this condition will be relaxed later).
Can we come up with a constraint on that would guarantee that ? It turns out that we can.
Assume for a moment that all pairwise sums of elements in are distinct, i.e., no four elements satisfy the following equation: , where , unless . If this is the case, at least one of the intersection points of the lines in the following configuration will miss an element of :
To see why it is so, observe that — a contradiction with ‘s not having solutions to this equation.
We now introduce one more way of thinking about these lines in that are trying to hit elements of (we promise it is going to be the last!). Associate lines with the vertices of a graph and draw an edge between two vertices if the intersection point of the corresponding vertices projects to (“kills an element of ”).
If all pairwise sums of are distinct, then the graph whose nodes are the horizontal and slanted lines does not have a 4-cycle. This property alone is sufficient to bound the total number of edges in the graph (and thus the number of elements of hit by these lines) to be less than . If the graph is bipartite, which is our case, this bound is known as the Zarankiewicz problem, which can be established via a simple counting argument.
If lines cannot cover more than elements of , it means that .
What’s left to do is to construct sets whose pairwise sums never repeat. They are known as modular Sidon sets, with several beautiful constructions resulting in sets of astonishingly high density. Ponder it for a moment: we want a subset of such that no two pairs of its elements sum to the same thing. Obviously, by the pigeonhole principle, the size of such as set is . This bound is tight, as there exist — explicit, and efficiently enumerable — sets of size !
Notice that when two lines cover an element of , their coefficients satisfy an especially simple condition: if , where , then . Let and . If all of is covered by intersections between lines and , then , where is the sumset of and . Using the language of additive combinatorics, Erdős and D. Newman posed in 1977 the problem of constructing subsets of that cannot be covered by sumsets of small sets. They proved that the set of “small squares” has this property, or in our terminology, for any .
IV. Moving upwards
Let’s relax the constraint of the previous definition by allowing two classes of lines — horizontal and arbitrarily slanted, but the only hits that count are due to intersections between lines of different classes. Call the resulting measure of complexity :
Given a subset of , let be the minimal number so that is covered by intersection of two classes of lines and , for , where only intersections between lines of different classes count towards covering .
By analogy with the previous argument, we’d like to identify a local property on that will result in a non-trivial bound on . More concretely, we should be looking for some condition on a small number of elements of that make them difficult to cover by few lines of two different classes.
Fortunately, one such property is not that difficult to find. Consider the following drawing:
The intercept theorem (known also as Thales’ theorem) implies that , and consequently (applying it a second time),
Conversely, if the 6-tuple is such that , these points cannot be covered all at once by three horizontal and two slanted lines.
Consider again the bipartite graph drawn on the sets of horizontal and slanted lines, where two lines are adjacent in the graph if their intersection point covers an element of . What is the maximal density of this graph if it is prohibited from containing the subgraph? Somewhat surprisingly, the answer is asymptotically the same as before, namely, the number of edges in the graph is . Therefore, if the set avoids 6-tuples satisfying (*), then .
What about constructing sets that have that property? A short answer is that we don’t know how to do so explicitly, but at least there exist sets satisfying this property with succinct representation.
V. Going all the way
Having flexed our muscles with the watered-down notions of sets’ DL-complexity, let us try to extend our technique to handle the most general case of unrestricted lines, where everything goes and all intersections count towards the attacker’s goal of covering the entire set .
Once again, we’d like to find a local property with global repercussions. Concretely, we should be looking for a configuration of lines whose intersection points satisfy some avoidable condition, similar to or the quadratic polynomial of the previous section. It may seem that we should look no further than Menelaus’ theorem, which gives us just that. If your projective geometry is a bit rusty, Menelaus’ theorem applies to the six intersection points of four lines in the plane:
It states, in the form most relevant to us that
It seems like a nice local property but what about its global ramifications? Namely, if we manage to construct a set such that no 6-tuple satisfies the cubic polynomial (**), what can we say about the number of lines required to cover that set? Well, our luck runs out here. Recall that we used the local property to guarantee that the graph, whose nodes corresponded to lines and edges corresponded to elements of covered by intersection points, excluded a certain subgraph. First, it was a 4-cycle, then . Unfortunately, if the graph excludes a complete graph on four vertices, which Menelaus’ theorem guarantees for sets avoiding (**), the number of edges in that graph can be as large as . This is the consequence of Turán’s theorem (or Erdős–Stone) that yields no bound better than that unless the excluded subgraph is bipartite.
The only path forward is to find a Menelaus-like theorem that allows us to exclude a bipartite graph. It turns that the minimal such configuration involves seven lines and 12 intersection points:
Most compactly, the theorem states that the following determinant evaluates to 0:
Using the same argument as before, if avoids solutions to the above equation on 12 variables and total degree 6, the “hit” graph defined over the lines avoids the graph. A variant of the Zarankiewicz bound guarantees that such graph has edges (the exponent in the Zarankiewicz bound depends only on size of the smaller part of the excluded bipartite graph). Since each element of the set corresponds to at least one edge of the “hit” graph, and consequently , which is better than the trivial bound . Finding explicit constructions remains a challenge, although it is easy to demonstrate existence of such sets with succinct representation by probabilistic method.
VI. Bipartite Menelaus’ Theorem and Open Problems
Even though our original motivation was rooted in cryptography, we ended up proving a fact of projective geometry. In an equivalent form, which is most similar to the standard formulation of Menelaus’ theorem, it asserts that
where the line segments are signed: positive if they point in the same direction as the line they are part of (for some arbitrary but fixed orientation), and negative otherwise.
The classic (and classical — after all, Menelaus of Alexandria lived in the first century AD) theorem is implied by ours. Indeed, in the degenerate case when , , and , following a furious round of cancellations, we end up with Menelaus’. This explains why we refer to our “12-point’’ theorem as bipartite Menelaus’: it is the minimal Menelaus-like theorem that involves lines separated into two classes.
We did search far and wide for evidence that this theorem had been known before, and came up empty. In retrospect, such a theorem is inevitable — the number of intersections (i.e., equations) grows quadratically in the number of lines, each of which only requires two free variables to describe. This is a counting argument that really gives no insight into why bipartite Menelaus’ theorem is what it is. Is there a purely geometric proof? Is it a consequence of a simpler/deeper fact about projective geometries over finite fields? We’d love to know.
Let’s measure our progress against the initial goal of finding explicit sets that are as hard as the entire group against the discrete-logarithm-finding adversary. We are not there yet — although we did develop some machinery for arguing that some sets are more resistant than the most pessimistic square-root bound implies, but these sets are hard to construct and too small to be useful. What about proving that some natural sets, such as the sets of squares, as in Erdős-Newman, or cubes, have high DL-complexity? It is conceivable that the combinatorial approach based on excluded subgraphs is not sufficient to get us to the sweet spot of sets of size and DL-complexity . What can?
A necessary disclaimer: the generic group model is just that — a model. Any instantiation of the abstract group allows direct observation of the group elements, and may enable attacks not captured by the model. For instance, representation of the group elements as integers modulo has enough structural properties that index calculus is exponentially more effective in than any generic algorithm. On the positive side, for many groups, such as some elliptic curves or prime-order subgroups of for sufficiently large , no algorithms for finding discrete logarithms faster than generic methods are presently known. It motivates studying generic groups as a useful abstraction of many groups of cryptographic significance.
The abstract (generic) group model was introduced in the papers by Nechaev and Shoup, and hardness of the discrete logarithm in that model was shown to be . Several generic methods for computing discrete logarithm with similar total running time are known: Shank’s baby-step giant-step method, Pollard’s rho and kangaroo (lambda) methods. These algorithms can be adapted to intervals to work in time , matching the pessimistic square-root bound. For small-weight subsets of see work of Stinson and references therein. Canetti put forward a variant of the Decisional Diffie-Hellman assumption where one of the exponents is sampled from an arbitrary distribution of bounded min-entropy. Chateauneuf, Ling, and Stinson gave a combinatorial characterization of algorithms for computing discrete logarithm in terms of slope coverings, and show how weak Sidon sets are related to optimal algorithms. Erdős and Newman defined the notion of bases for subsets of , which corresponds (up to a factor of 4) to BSGS1-complexity in . They showed that random subsets of size have basis of size and for sets of squares their basis is . Subsuming the counting argument of Erdős and Newman, Schnorr proved that the discrete logarithm problem has essentially optimal (up to a logarithmic factor) hardness on random subsets. Resolving the question of Erdős and Newman, Alon, Bukh, and Sudakov showed that for sets of size exactly even their restricted DL-complexity, is . They also extend analysis of BSGS1-complexity for the set of squares to that of higher powers.
This is a followup to the previous post on program obfuscation written jointly with Guy Rothblum.
The problem of program obfuscation is fascinating. The question at hand is whether one can transform a program (say, described as a Boolean circuit) into a form that is executable (i.e., has the same input/output behavior), but is otherwise completely unintelligible. This problem was originally formalized by Barak et al. [BGI+01], who constructed (contrived) function families that are not obfuscatable under the natural definition of virtual black box (VBB) security, as well as various relaxations. Loosely speaking, VBB security requires that anything that can be efficiently computed given an obfuscation of a program could be efficiently computed from black box access to the program.
The result of Barak et al. (and followup work [GK05]) left researchers quite pessimistic about solving the general problem of program obfuscation, and (until recently) most of the effort was on obfuscating very simple functions, such as point functions [Canetti97] (functions that are zero everywhere except for a single input), variants of point functions, hyper-planes [CRV10], conjunctions [BR13a], and CNFs [BR13b].
The challenging and cryptographically meaningful function families to obfuscate are those that have high “pseudo-entropy”. Extreme examples are pseudo-random functions and decryption algorithms. Note that it is trivial (and uninteresting) to obfuscate functions that are learnable from black-box access.
The focus of the previous post was on recent exciting progress on program obfuscation, initiated by the fascinating recent work of Garg et al. [GGHRSW13], which gave the first plausible candidate for general-purpose obfuscation. They conjectured that it is an indistinguishability obfuscator; i.e., given any two circuits C_1 and C_2 of the same size and computing the same function, no polynomial time adversary can distinguish between the obfuscation of C_1 and that of C_2. Unfortunately, it is not clear how meaningful this notion is, since an indistinguishablity obfuscator does not guarantee to hide the secrets of the underlying program (or circuit); indeed, as we know from [BGI+01] there are function families that can always be reverse engineered. One of the motivations of indistinguishability obfuscation is that it was proven to be equivalent to “best possible” obfuscation by Goldwasser and Rothblum [GR07].
The work of Garg et al. gave optimism to many cryptographers, and several tried to analyze and prove security of their construction (and its variations) [GGHRSW13,BR13c,BGKPS13], with limited success. To date, a variant of the construction is known to be VBB secure against a subclass of adversaries, known as algebraic adversaries. However, we have no evidence as to its security level against non-algebraic adversaries. Moreover, as mentioned above, even if we were able to prove that it is an indistinguishability obfuscator, it is still unclear how meaningful it is.
Nevertheless, to my surprise, subsequent to [GGHRSW13] a flood of results have appeared showing that indistinguishability obfuscation suffices for many other cryptographic applications, such as the construction of public-key encryption from private-key encryption, the existence of deniable encryption, multi-input functional encryption, multiparty key exchange, broadcast encryption, traitor tracingand, more [SW13,GGHRSW13,HSW13,GGJS13,BZ13,BCP13,BCPR13].
Unfortunately, in this post, we diverge from the optimistic view of the crowd. In a somewhat strange twist, in joint work with Cohn and Goldwasser [CGK14], we show that there are negative implications of [GGHRSW13] to accompany the positive ones. In particular, we show the existence of indistinguishability obfuscation implies strong limitations on the possibility of VBB obfuscation with a universal simulator for any function family with high pseudo-entropy. What is VBB simulation with a universal simulator?
The Barak et al. definition of VBB obfuscation of a circuit family requires that for each probabilistic polynomial-time (PPT) adversary A there exists a PPT simulator S that succeeds in simulating the output of A, when A is given the obfuscated circuit but S is given only black-box access to the circuit. Unfortunately, this definition does not say how hard (or easy) it is to find this simulator S. This sufficed for the Barak et al. work, as they were after showing impossibility results and thus were happy to work with an obfuscation definition that didn’t address how one may find S.
A stronger and arguably more meaningful definition requires that there exist a *universal* PPT simulator capable of simulating any PPT adversary A given the code of A. Such a definition is referred to as VBB with a universal simulator. Ideally, we would like to construct an obfuscator that is VBB secure with a universal simulator. However, given the negative result of [BGI+01] we know that we cannot hope to construct such an obfuscator for all function families, and we must focus on specific function families. That said, it may be the case that all “natural” function families are obfuscatable.
In [CGK14] we show that assuming the existence of indistinguishable obfuscation, <strong>all</strong> function families with super-polynomial “pseudo-entropy” cannot be VBB obfuscated with a universal simulator. Informally, a function family has super-polynomial pseudo-entropy if given black-box access to the function it appears to have super-polynomial min-entropy. Such families include all pseudo-random function families, as well as every semantically secure secret-key and public key encryption scheme, or any secure digital signature scheme (where randomness is generated by using a pseudo-random function).
We obtain this result by exploiting a connection between obfuscation with a universal simulator and obfuscation with auxiliary inputs, and by showing new impossibility results for obfuscation with auxiliary inputs.
In light of this, where should we be heading? It seems that in the quest for positive results, we should either restrict our attention to function families that do not have super-polynomial pseudo-entropy, or try to bypass these negative results by considering relaxed definitions of security. If we stick to our goal of VBB security for functions with super-polynomial pseudo-entropy, we will need to use non-black techniques where the simulator uses the adversary in an inefficient manner. To my knowledge, to date such a technique was used only once, in [Canetti97].
A class of functions that would be interesting to study which do not have super-polynomial pseudo entropy are evasive function families, which are functions that are zero almost everywhere, and any PPT adversary who is given oracle access to a random function in the family cannot find a non-zero input. We have no negative results for such families, and some partial positive results are known [BBCPKS13].
These are fascinating questions and I am excited to see new developments in the upcoming months.
[BBCPKS13] Boaz Barak, Nir Bitanski, Ran Canetti, Omer Paneth, Yael Tauman Kalai, Amit Sahai: Obfuscation for Evasive Functions. Cryptology ePrint Archive, Report 2013/ 668
[BGKPS13] Boaz Barak, Sanjam Garg, Yael Tauman Kalai, Omer Paneth, Amit Sahai: Protecting Obfuscation against Algebraic Attacks. Cryptology ePrint Archive, Report 2013/631
[BGI+01] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vadhan, Ke Yang: On the (im)possibility of obfuscating programs. Crypto 2001, Journal of the ACM 2012
[BCPR13] Nir Bitansky,Ran Canetti,Omer Paneth, Alon Rosen: Indistinguishability Obfuscation vs. Auxiliary-Input Extractable Functions: One Must Fall. Cryptology ePrint Archive, Report 2013/641
[BZ13] Dan Boneh, Mark Zhandry: Multiparty Key Exchange, Efficient Traitor Tracing, and More from Indistinguishability Obfuscation. Cryptology ePrint Archive, Report 2013/642
[BCP13] Elette Boyle, Kai-Min Chung, Rafael Pass: On Extractability Obfuscation. Cryptology ePrint Archive, Report 2013/650
[BR13a] Zvika Brakerski, Guy N. Rothblum: Obfuscating Conjunctions. CRYPTO 2013
[BR13b] Zvika Brakerski, Guy N. Rothblum: Black-Box Obfuscation for d-CNFs. ITCS 2014
[BR13c] Zvika Brakerski, Guy N. Rothblum: Virtual Black-Box Obfuscation for All Circuits via Generic Graded Encoding. TCC 2014
[Canetti97] Ran Canetti: Towards Realizing Random Oracles: Hash Functions That Hide All Partial Information. CRYPTO 1997
[CRV10] Ran Canetti, Guy N. Rothblum, Mayank Varia: Obfuscation of Hyperplane Membership. TCC 2010
[CGK14] Henry Cohn, Shafi Goldwasser, Yael Tauman Kalai: The Impossibility of Obfuscation with a Universal Simulator. IACR Cryptology ePrint Archive, Report 2013/665
[GGHRSW13] Sanjam Garg, Craig Gentry, Shai Halevi, Mariana Raykova, Amit Sahai, Brent Waters: Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits. FOCS 2013
[GGJS13] Shafi Goldwasser, Vipul Goyal,Abhishek Jain, Amit Sahai: Multi-Input Functional Encryption. IACR Cryptology ePrint Archive, Report 2013/727
[GK05] Shafi Goldwasser, Yael Tauman Kalai: On the Impossibility of Obfuscation with Auxiliary Input. FOCS 2005
[GR07] Shafi Goldwasser, Guy N. Rothblum: On Best-Possible Obfuscation. TCC 2007
[HSW13] Susan Hohenberger, Amit Sahai, Brent Waters: Replacing a random oracle: Full domain hash from indistinguishability obfuscation. IACR Cryptology ePrint Archive, Report 2013/509
[SW13] Amit Sahai, Brent Waters: How to Use Indistinguishability Obfuscation: Deniable Encryption, and More. IACR Cryptology ePrint Archive, Report 2013/454
(joint post by Yael Kalai and Guy Rothblum)
It feels especially appropriate to write about recent developments in cryptography and code obfuscation while basking in the afterglow of a wonderful workshop at the Weizmann Institute of Science, celebrating the work of Shafi Goldwasser and Sivio Micali—this year’s Turing Award recipients. Shafi and Silvio repeatedly demonstrated that in cryptography we can obtain seemingly impossible or self-contradictory goals, such as zero-knowledge proofs that convey no information beyond their validity, or pseudorandom functions whose input-output behavior appears completely random (even though they have a succinct description).
Our blog post is about another such “pie in the sky” problem in cryptography: code obfuscation. The question at hand is whether one can transform a program (say, described as a Boolean circuit), maintaining its input/output behavior but making it otherwise unintelligible. This problem was originally formalized by Barak et al. [BGI+01] (following earlier work by Hada [Hada00]). However, rather than providing tools to obfuscate programs, Barak et al. gave impossibility results. They considered the natural definition of virtual-black-box obfuscation (VBB-Obf): anything that can be computed efficiently given a program’s obfuscation, should be efficiently computable from black box access to the program. This natural definition is quite strong, and in particular general-purpose VBB-Obf (under the “right” formalization) has fantastic cryptographic applications. Unfortunately, Barak et al. proved a strong negative result, showing that general-purpose VBB obfuscation is impossible. Namely, they constructed a (contrived) function family for which there exists a PPT adversary that given *any* code of a function f in the family, can find the secret key associated with f, whereas this key remains completely hidden given only black-box access to f. Thus, access to the code is *very different* from black box access, and the family seems very difficult to obfuscate in any meaningful sense.
Following this thought provoking work, much effort has been devoted by the cryptographic community to constructing obfuscators for natural classes of programs. However until recently, all known obfuscation candidates were for limited classes of functions, such as (for example) point functions [Canetti97] (functions that are zero everywhere except for a single input), variants of point functions, hyper-planes [CRV10], conjunctions [BR13a], and CNFs [BR13b]. It was not clear how to extend these works to get obfuscation for more complex classes of functions, and until recently there weren’t even suggestions for candidates or heuristics.
This changed with a fascinating recent work of Garg et al. [GGHRSW13]. They propose a candidate general-purpose program obfuscator. Namely, they construct an obfuscator, that takes as input any program (or circuit) and outputs another program (or circuit) that has the same functionality as the input program, and seems to hide secret information. The big question is whether this candidate construction indeed has a “meaningful” secrecy guarantee. One can simply assume that the [GGHRSW13] obfuscator is secure. However, due to the negative result of [BGI+01], assuming that the [GGHRSW13] obfuscator always offers a “meaningful” security guarantee is simply false.
To bypass these negative results, [GGHRSW13] study the possibility that their obfuscator is an indistinguishability obfuscator (Ind-Obf); i.e. that given any two circuits C and C’ of the same size that compute the same functionality f, no polynomial time adversary can distinguish between the obfuscation of C and the obfuscation of C’. There are no known impossibility results for Ind-Obf. Indistinguishability obfuscation provides an intuitively appealing notion of security via an equivalence to “best possible” obfuscation, a notion put forth by Goldwasser and Rothblum [GR07]. This notion makes the relaxed requirement that the obfuscated program leaks as little information as any other program with the same functionality (and of similar size). Further, in a fascinating recent work, Sahai and Waters [SW13] show that Ind-Obf has many exciting cryptographic applications (e.g. deniable encryption [CDNO97]).
In [GGHRSW13], and in followup works [BR13c,BGKPS13], there is some evidence that this obfuscator and variants of it do have some secrecy guarantees. The obfuscator of Garg et al. makes use of multi-linear maps, a powerful new cryptographic tool introduced by [GGH13]. Loosely speaking, such maps allow one to encode elements in a way that one can efficiently add encodings, multiply encodings (a bounded number of times), and check whether an encoding is an encoding of zero. [BR13c] prove that a variant of the [GGHRSW13] obfuscator does indeed satisfy the Ind-Obf definition if the adversary is limited to “algebraic” attacks, which means that it can only add and multiply the multi-linear encodings, and check whether an encoding is an encoding of zero, but cannot do anything else with these encodings. I.e., they prove security for a limited class of attackers. Moreover, [BR13c,BGKPS13] show that variants of the construction even satisfy the stronger VBB-Obf definition for “algebraic” attacks.
Of course, the attentive reader may be left puzzled by this state of affairs, as Barak et al. showed that satisfying VBB-Obf is impossible! There is no contradiction, because there is no reason for attackers to limit themselves to algebraic attacks. For example, an attacker can feed the obfuscated circuit (which contains these encodings) as input to another circuit. Barak et al. [BGI+01] make use of such attackers to obtain their negative results.
To summarize (and interpret) the recent developments:
- Using powerful new cryptographic tools (multilinear maps), Garg et al. present a candidate for obfuscation that may provide meaningful security guarantees.
Namely, it is a candidate for Indistinguishability Obfuscation, which provides an appealing semantic notion of “best possible security”, and has exciting cryptographic applications.
- There is some evidence for the security of this construction and variants of it: we have obfuscators that provably resist the rich family of “algebraic attacks”.
- We know that adversaries may mount non-algebraic attacks against obfuscators, and indeed restricting our attention to algebraic attackers lets us bypass known impossibility results.
- It is an outstanding open problem to either prove the security of an Indistinguishability Obfuscator under standard assumptions (or under any “falsifiable” assumption), or to show impossibility for general-purpose Indistinguishability Obfuscation.
In a follow-up post, Yael will describe even more recent work that leads her to be pessimistic about the possibility of obtaining strong positive results on obfuscation for many natural classes of functions.
In conclusion, there have been many exciting recent developments in cryptography (perhaps most notably in the study of fully homomorphic encryption), and it appears that we may be on the brink of another exciting wave of developments in the study of code obfuscation. At the very least, there are fascinating new foundational problems for the field to study.
Going back to the Weizmann workshop, one recurring theme in this workshop was participants recounting how, again and again, the question has been raised: “what is left to do in cryptography?” (As early as the early 80’s). Again and again, however, we are surprised and delighted by new conceptual and technical breakthroughs in the field. Nowadays it seems clear that while much has been done in cryptography, even more remains to be explored.
◾[BGI+01] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vadhan, Ke Yang: On the (im)possibility of obfuscating programs. Crypto 2001, Journal of the ACM 2012
◾[BGKPS13] Boaz Barak, Sanjam Garg, Yael Tauman Kalai, Omer Paneth, Amit Sahai: Protecting Obfuscation against Algebraic Attacks. CRYPTO ePrint 2013
◾[BR13a] Zvika Brakerski, Guy N. Rothblum: Obfuscating Conjunctions. CRYPTO 2013
◾[BR13b] Zvika Brakerski, Guy N. Rothblum: Black-Box Obfuscation for d-CNFs. ITCS 2014
◾[BR13c] Zvika Brakerski, Guy N. Rothblum: Virtual Black-Box Obfuscation for All Circuits via Generic Graded Encoding. TCC 2014
◾[CDNO97] Ran Canetti, Cynthia Dwork, Moni Naor, Rafail Ostrovsky: Deniable Encryption. CRYPTO 1997
◾[Canetti97] Ran Canetti: Towards Realizing Random Oracles: Hash Functions That Hide All Partial Information. CRYPTO 1997
◾[CRV10] Ran Canetti, Guy N. Rothblum, Mayank Varia: Obfuscation of Hyperplane Membership. TCC 2010
◾[GGH13] Sanjam Garg, Craig Gentry, Shai Halevi: Candidate Multilinear Maps from Ideal Lattices. EUROCRYPT 2013
◾[GGHRSW13] Sanjam Garg, Craig Gentry, Shai Halevi, Mariana Raykova, Amit Sahai, Brent Waters: Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits. FOCS 2013
◾[GR07] Shafi Goldwasser, Guy N. Rothblum: On Best-Possible Obfuscation. TCC 2007
◾[Hada00] Satoshi Hada: Zero-Knowledge and Code Obfuscation. ASIACRYPT 2000
◾[SW13] Amit Sahai, Brent Waters: How to Use Indistinguishability Obfuscation: Deniable Encryption, and More. CRYPTO ePrint 2013
The various MSR labs are looking for postdocs and full-time researchers in many scientific fields, including all areas of theoretical Computer Science. You can apply via this website. Please don’t forget to specify in the form all the labs you may be interested in.
For Microsoft Research Silicon Valley applications submitted by December first will receive full consideration.
The deadline for the Schramm Postdoctoral Fellowship (joint with MIT math and Microsoft Research New England) is also on December first.
As Omer says: dont trust the machine. Make sure that somebody relevant knows you are applying.