There is a longstanding feud in statistics between “frequentists” and “Bayesians”. One way to understand these two camps is to consider the following statements:
The probability that my paternal great great grandfather had blue eyes is 5%
The probability that the -th digit of is 7 is 10%
A frquentist would say that these statements are meaningless— even if I don’t know his eye color, my great great grandfather either had blue eyes or didn’t, and there is no sense in which it can be blue with probability that is neither 0 or 1. Similarly, the -th digit of is either 7 or isn’t. A Bayesian would say that such probabilities make perfect sense, and in particular one would be completely rational to take, say, an 11 to 1 bet that the -th digit of is 7. Moreover, almost any real-life application of probability theory involves probabilities over events that are in principle pre-determined. After all, even the results of a toss of a die are completely determined by the initial conditions. (This is of course a caricature of both positions, which are subtler than that; also many statisticians are pragmatists who can use either a frequentist or Bayesian perspective in different situations; you can do far worse than watching Michael Jordan’s talk on this issue.)
As theoretical computer scientists, we are naturally frequentists. For us a probability is a measure over some sample space. Moreover, our canonical activity is worst case analysis of algorithms, which meshes well with the frequentist point of view that doesn’t assume any prior distribution over the inputs one might encounter in practice. The canonical notion of pseudo randomness corresponds to a computational analog of frequentist probability. In this blog post I will discuss a different notion of “pseudo randomness”, arising from the study of computational proof systems and convex programming, that can be viewed as a computational analog of Bayesian probability. Borrowing a page from quantum mechanics, this notion will invole objects that behave like probability measure but where some of the probabilities can actually be negative.
“Classical” pseudo-randomness theory
Before describing this new notion, it might be useful to discuss the “classical” notion of pseudo-randomness as was developed in the 1980’s works of Shamir, Blum, Micali, Yao, Goldwasser and others. Recall that a probability measure is a function where is the sample space (which you can think of as the set of binary strings of length for some ), and satisfies that for all and . For a function , we define the expectation of under to be .
Nore that two distributions and are identical if and only if for every function . Pseudorandomness arises from relaxing this notion to consider only functions that are efficiently computable. That is, loosely speaking, two measures and are computationaly indistinguishable if for every computationally bounded test (i.e., function that can be efficiently computable), and are very close to one another. We say that is pseudorandom if it is computationally indistinguishable from the uniform distribution (i.e., the measure ).
I deliberatly omitted the definition of “computationally bounded” and it can change in different settings. Sometimes we consider the set of all tests that can computed by an efficient Turing machine or Boolean circuit (and then need to rely on computational assumptions), and in some applications one can consider more restricted families of tests (such as in Braverman’s result where he showed that distributions with polylogarithmic independence fool all small constant-depth circuits, or the result of Green and Tao that a certain distribution on “almost primes” fools a particular class of tests related to arithmetic progressions).
Pseudorandomness and computational indistinguishability have found a great number of applications in many areas of computer science and mathematics, see Salil Vadhan’s monograph for more on this beautiful area.
Bayesian pseudo-distributions arise from taking a computational analog of Bayesian probability. Consider the following scenario: suppose that we are given a SAT formula that has a unique assignment . A frequentist would say that there is no sense to a statement such as “the probability that is 60%”. Even a (non computational) Bayesian might have some qualms about defending such a statement. After all, the observation completely determines , and so according to standard Bayesian analysis, the only posterior distribution that makes sense is the one that puts all weight on the single string . But yet to a computer scientist it’s clear that simply because is completely determined by the formula , it doesn’t mean that we can actually recover it. Hence we have some uncertainty about it, and encoding this uncertaintly via probabilities seems rather natural.
But how do we actually go about it? Rather than considering efficient tests, we will focus our attention on efficient deduction or proof systems. That is, we say that is a pseudo distribution consistent with if it satisfies that for every function and number such that a computationally bounded observer can deduce the conclusion that from the assumption that satisfies . As above we are deliberately vague on how we model the deductive power of computationally bounded observers, but in fact I also glossed over on an even more basic point— what kind of an object is ? The initial guess is that is simply another distribution — it might not be the true distribution that is supported on the unique satisfying assignment, but simply has a different support. However one can quickly see that this is not possible: under any reasonable model of computation, a bounded observer would be able to deduce that for every clause of . For example, if contains the clause then clearly it is trivial to deduce that every distribution over satisfying assignments would satisfy . But note that if was supported on even a single string that failed to satisfy this clause then it would have been the case that . Since the same reasoning applies to all clauses, we can conslude that the only distribution that is consistent with is the distribution over the unique satisfying assignment.
The above seems to trivialize the definition, and again rule out the possiblity of statements such as ““. However, this only happened because we insisted on being an actual distribution. The definition didn’t require that, but merely required the existence of the operator mapping to the number where is in some class that computationally bounded observers can argue about. Such an operator need not correspond to actual distributions, though if it is linear (which makes sense when we think of a class closed under taking linear combinations), then it must have the form for some function . In fact, we will identify this function with . Surely a bounded observer can deduce that and hence we can assume . The only constraint we might not be able to enforce is that for all . Hence we can think of as a “distribution” that can take negative probabilities, as long as for “simple” functions , it is still the case that whenever a computationally bounded can prove that is non-negative.
The sum of squares proof system (also known as Positivstallensatz) is one concrete candidate for a bounded deduction system. There the simple functions correspond to low degree polynomials, and (roughly speaking) a computationally bounded prover can deduce from and that and and can use the axiom that for every low degree . This gives rise to a rather clean notion of pseudo-distributions where we can make precise statements such as “” where is a distribution over satisfying assignments of , even when has a unique satisfying assignments. In fact, we can even make such statements when has no satisfying assignments, as long as a computationally bounded prover cannot certify that this is the case. Perhaps even more exciting is that we have some reasons to believe (or at least hope) that in some contexts this proof system is optimal in the sense that it captures the capabalities of every computationally bounded observer. There are still many more questions in this research area. See my lecture notes and survey for more on this topic.
I do believe that this notion of “pseudo distribution” can be applicable beyond the sum-of-squares algorithm. For example, in number theory there are many heuristics that treat deterministic objects (such as the primes) as if they were chosen at random, and I wonder if these ideas can help formalize such intuitions. A similar situation arises in the study of random constraint satisfaction problems using local algorithms. For a random SAT formula with variables and clauses, the Belief Propagation algorithm predicts correctly the entropy of the distribution over satisfying assignments for up to roughly 3.86, but then diverges from the truth and in particular predicts that this entropy is positive (and hence the formula is satisfiable) when can be as large as 4.667, whereas the true threshold is roughly 4.267 (see Chapter 14 of this excellent book). One can imagine a notion of “belief propagation pesudo-distribution” which coincides with the actual distribution up to some density but then diverges from it as the problem becomes more computationally difficult. I am also wondering if it can be related to cryptographic notions such as “inaccesible entropy” (e.g., see here) or to questions in computational economics where we try to model the beliefs of computationally bounded agents.
As a cryptographer, I am used to the viewpoint that computational problems are presumed hard unless proven otherwise. We cryptographers constantly come up with new hardness assumptions, and generally use the heuristic that if a problem doesn’t have an obvious algorithm then it must be hard. We’ve had some interesting failures (see here for a recent example), but this heuristic seems to be successful more often than not.
In most other fields of study, people have the opposite intuition. Algorithms researchers are generally in the business of solving problems, and not giving up any time the obvious approach doesn’t work, and so take the viewpoint that a problem is presumed easy unless it is proven hard. In other fields people seem to take it a step further and claim that a problem is easy even if it is proven to be hard. People working on SAT solving and model checking solve SAT instances on a regular basis, leading some of them to claim that SAT is easy in practice. Similarly, many problems in machine learning are NP hard, but are yet solved all the time by current heuristics (see here for more).
While finding a Nash equilibrium is considered a hard computational problem, economists like to point out (as Eva Tardos told me recently) that there is milk on the shelves in stores, which means that producers and consumers managed somehow to come to an equilibrium where the market clears. (Interestingly, it seems that the process in which people do so is not far from the same heuristics used in machine learning.)
One could say that cryptography is the outlier because the instances there are designed to be hard. Perhaps hardness is this rare phenomenon that needs to be searched for, and P equals NP for “natural” instances. On the other hand, by counting arguments a priori we would expect a “random” or “generic” problem to be hard. Moreover, perhaps in those other fields the problems are designed to be easy. After all, when we use SAT to do model checking, we do so to verify the correctness of a system that was in fact designed by some human to be correct, and would perhaps have been verifiable (with a lot of effort) by a human as well. So perhaps it’s not so surprising that algorithms are able to mimic this reasoning. Similarly, machine learning is mostly about trying to do tasks that humans already succeed in. One could also argue that as a society we design rules that would make achieving equilibrium tractable, and perhaps for this reason we use a fixed price for milk as opposed to some complicated contracts.
So is hardness the rule or the exception? I don’t really know. Perhaps the physicists should decide, as the problems they study come from nature, who presumably doesn’t have any bias toward hardness or easiness. Or does she? After all, nature needs to compute as well (at least in the forward direction). Random constraint satisfaction problems, such as random 3SAT, seem to be a meeting point for physicists, computational complexity folks, cryptographers, and more, and a recent paper makes some algorithmic progress on 3SAT, though it does seem to still be bounded away from the threshold, and it is possible the algorithmic picture clarifies for kSAT and kXOR as k grows.
What do you think?
On Saturday October 17, the day immediately before FOCS 2015, the Simons Institute for the Theory of Computing will host a celebration of the work of Dick Karp on the occasion of his 80th birthday. This event, which features a dozen speakers from across the spectrum of theoretical Computer Science, will also be held at the DoubleTree Marina Hotel, the same venue as FOCS itself. All participants are asked to register to attend; registration is free, and is done in conjunction with FOCS registration. Visit KARPfest80 for more information.
- A lovely interview with Cynthia Dwork in the New York Times on bias in computations. In particular, discussing our work (joint with Moritz Hardt, Toni Pitassi and Rich Zemel) on Fairness through Awareness.
- Our Science article, The reusable holdout: Preserving validity in adaptive data analysis (joint with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi and Aaron Roth), is out.
update (8/16/2015): On Privacy and Anonymisation (with Cynthia Dwork and Salil Vadhan) in The Economist
[Below is a guest post from Sanjeev Arora on his redesign of the traditional graduate algorithms course to be a better match for today’s students. –Boaz]
For the last two years I have tried new ideas in teaching algorithms at the graduate level. The course is directed at first year CS grads, but is also taken by grads from related disciplines, and many advanced undergrads. (Links to course homepage, and single file with all course materials.)
The course may be interesting to you if, like me, you are rethinking the traditional choice of topics. The following were my thoughts behind the redesign:
- The environment for algorithms design and use has greatly changed since the 1980s. Problems tend to be less cleanly stated (as opposed to “bipartite matching” or “maximum flow”) and often involve high-dimensional and/or noisy inputs. Continuous optimization is increasingly important.
- As the last theory course my students (grad or undergrad) might take for the rest of their lives, it should somewhat fill in holes in their undergraduate CS education: information/coding theory, economic utility and game theory, decision-making under uncertainty, cryptography (anything beyond the RSA cryptosystem), etc.
- Programming assignments need to be brought back! CS students like hands-on learning: an algorithm becomes real only once they see it run on real data. Also, computer scientists today —whether in industry or academia—rely on subroutine libraries and scripting languages. A few lines in Matlab and Scipy can be written in minutes and run on datasets of millions or billions of numbers. No JAVA or C++ needed! Algorithms education should weave in such powerful tools. It is beneficial even for theory students play with them.
Sample programming assignments: (a) (compression via SVD) given a 512 x 512 grayscale image, treat it as a matrix and take its rank k approximation via SVD, for k=15, 30,45,60. Use mat2gray in matlab to render this new matrix as a grayscale image and see what k suffices for realistic recovery. (b) You are given S&P stock price data for 10 years. run online gradient descent to manage a portfolio (Lecture 16), and report what returns you get with various parameter settings.
Students are allowed to do a final project in lieu of a final, and many choose to apply algorithms to some real world problem they are interested in. Sample projects are also listed on the course page.
I welcome your comments, suggestions, and links to other relevant course materials on the web!
Indistinguishability Obfuscation and Multi-linear Maps: A Brave New World – Guest Post by Ran Canetti
A bunch of us hapless cryptographers got the following boilerplate comment from the FOCS’15 PC:
“Overall, submissions related to multi-linear maps and indistinguishability obfuscation were held to a somewhat higher standard. The PC expressed some concern with the recent flurry of activities pertaining to multi-linear maps and indistinguishability obfuscation, given how little we understand and can say and *prove* about the underlying hardness assumptions”.
This comment was clearly written with the best of intentions, to explain views expressed at the PC deliberations. And I’m thankful to it – mainly since it made the underlying misconceptions so explicit that it mandated a response. So, after discussing and commiserating with colleagues here at Simons, and after amusing ourselves with some analogues of above statement (e.g., “results on NP completeness are held to a higher standard given how little we understand and can say and *prove* about the hardness solving SAT in polynomial time”), I decided to try to write an – obviously subjective – account for the recent developments in multilinear maps and indistinguishability obfuscation (IO) and why this exciting research should be embraced and highlighted rather than “held to a somewhat higher standard” — in spite of how little we understand about the underlying assumptions. The account is aimed at the general CS-theorist.
Let me start by giving rough definitions of the concepts involved. An Indistinguishability Obfuscator (IO) is a randomized algorithm O that takes as input a circuit C and outputs a (distribution over) circuits O(C) with the properties that:
- C and O(C) have the same functionality,
- O(C) is only polynomially larger than C
- for any two same-size, functuinally equivalent circuits C and C’ we have that O(C) ~ O(C’) (i.e., the distributions over strings representing O(C) and O(C’) are computationally indistinguishable).
IO has been proposed as a notion of obfuscation in 2000 (Hada, Barak-Goldreich-Impagliazzo-Sahai-Vadhan-Yang). Indeed, it is arguably a clean and appealing notion – in some sense the natural extension of semantic security of standard encryption to “functionality-preserving encryption of programs”. However, it has been largely viewed as too weak to be of real applicability or interest. (There were also no candidate polytime IO schemes, but this in my eyes is a secondary point, see below.)
Things changed dramatically in 2013 when Sahai and Waters demonstrated how IO schemes can be ingeniously combined with other rather “mundane” cryptographic constructs to do some amazing things. Since then dozens of papers came about that extend the SW techniques and apply them to obtain even more amazing things – that by now have transcended crypto and spilled over to other areas. (e.g.: deniable encryption, succinct delegation, succinct multi-party computation with hardly any interaction, one message succinct witness hiding and witness indistinguishable proofs, hash functions with random-oracle-like properties, hardness results for PPAD, and many more). In fact, think about a result in your area that assumes that some computation is done inside a black box – most probably IO can replace that assumption in one way or another…
Still, my (subjective but distinct) feeling is that we are far from understanding the limits and full power of IO. Furthermore, the study of IO has brought with it a whole new toolbox of techniques that are intriguing in their own right, and teach us about the power and limitations of working with “encrypted computations”.
So far I have not mentioned any candidate constructions of IO – and indeed the above study is arguably valuable as a pure study of this amazing concept, even without any candidate constructions. (Paraphrasing Levin on quantum computers, one can take the viewpoint that the above is the study of impossibility results for IO…)
However, unlike quantum computers, here we also have candidate constructions. This is where multilinear maps come to play.
Multi-linear maps are this cool new technical tool (or set of tools) that was recently put forth. (The general concept was proposed by Boneh and Silverberg around 2000, and the first candidate construction of one of the current variants was presented in 2012 by Garg, Gentry and Halevi.) Essentially, a multilinear map scheme is a fully homomorphic encryption scheme where the public key provides, in addition to the ability to encrypt elements and perform homomorphic operations on ciphertexts, also the ability to partially decrypt ciphertexts under certain restrictions. There are many incomparable variants of this general paradigm, which differ both in the functionality provided and in the security guarantees. Indeed, variants appear to be closely tied to candidate constructions. Furthermore, our understanding of what’s possible here has been evolving considerably, with multiple new constructions, attacks, and fixes reported.
Still, the number and variety of applications of multi-linear maps makes it clear that this “family of primitives” is extremely powerful and well worth studying – both at the level of candidate constructions, at the level of finding the “right” computational abstractions, and at the level of applications. In a sense, we are here back to the 70’s: we are faced with this new set of algebraic and number theoretic tools, and are struggling to find good ways to use them and abstract them.
Indeed, some of the most powerful applications of multilinear maps are candidate constructions of IO schemes. The first such candidate construction (by Garg, Gentry, Halevi, Raykova, Sahai and Waters in 2013) came with only heuristic arguments for security; However more rigorous analyses of this and other constructions, based on well-defined formulations of multi-linear map variants, soon followed suite. Some of these analyses have eventually been “broken” in the sense that we currently don’t have candidate constructions that satisfy the properties they assume. Still, other analyses do remain valid. Indeed, there are no attacks against the actual basic IO scheme of Garg et al.
The fact that the only current candidate constructions of IO need to assume existence of some variant of multi-linear maps at some point or another may make it seem as it the two concepts are somehow tied together. However, there is no reason to believe that this is the case. For all we know, multi-linear maps are just the path first uncovered to IO, and other paths may well be found. Similarly, even if IO turns out to be unobtainable for some reason, the study of multilinear maps and their power will still remain very relevant.
So, to sum up this long-winded account:
- IO is a natural and fascinating computational concept. Studying its consequences (both within and outside cryptography) is a well worth endeavor.
- Studying new candidate constructions of IO and/or new analyses of their security is another well worth endeavor.
- Multilinear maps are an intriguing and powerful set of techniques and tools. Finding better candidate constructions and abstractions is of central importance to cryptography. Finding new cool uses of these maps is another intriguing challenge.
- The three should be treated as separate (although touching and potentially interleaving) research efforts.
I’d like to thank Guy Rothblum and Vinod Vaikuntanathan for great comments that significantly improved this post.