# Structure vs. Combinatorics in Computational Complexity

(Also available as a pdf file. Apologies for the many footnotes, feel free to skip them.)

Computational problems come in all different types and from all kinds of applications, arising from engineering as well the mathematical, natural, and social sciences, and involving abstractions such as graphs, strings, numbers, and more. The universe of potential algorithms is just as rich, and so a priori one would expect that the best algorithms for different problems would have all kinds of flavors and running times. However natural computational problems “observed in the wild” often display a curious *dichotomy*— either the running time of the fastest algorithm for the problem is some small polynomial in the input length (e.g., or ) or it is *exponential* (i.e., for some constant ). Moreover, while indeed there is a great variety of efficient algorithms for those problems that admit them, there are some general principles such as *convexity* (i.e., the ability to make local improvements to suboptimal solutions or local extensions to partial ones) that seem to underly a large number of these algorithms.^{1}

To be sure, none of these observations are universal laws. In fact there are theorems showing exceptions to such dichotomies: the Time Hierarchy Theorem says that for essentially any time-complexity function there is a problem whose fastest algorithm runs in time (essentially) . Also, Ladner’s Theorem says that, assuming PNP, there are problems that are neither in P nor are NP-complete. Moreover, there are some *natural* problems with apparent “intermediate complexity”. Perhaps the most well known example is the *Integer Factoring* problem mentioned below. Nevertheless, the phenomenon of dichotomy, and the related phenomenon of recurring algorithmic principles across many problems, seem far too prevalent to be just an accident, and it is these phenomena that are the topic of this essay.

I believe that one reason underlying this pattern is that many computational problems, in particular those arising from combinatorial optimization, are *unstructured*. The lack of structure means that there is not much for an algorithm to exploit and so the problem is either `very easy''--- e.g., the solution space is simple enough so that the problem can be solved by local search or convex optimization<a href="#fn2"><sup>2</sup></a><a id="back2"></a> --- or it is`

very hard”— e.g., it is NP-hard and one can’t do much better than exhaustive search. On the other hand there are some problems that posses a certain (often algebraic) *structure*, which typically is exploitable in some non-trivial algorithmic way. These structured problems are hence never `extremely hard'', but they are also typically not`

extremely easy” since the algorithms solving them tend to be more specialized, taking advantage of their unique properties. In particular, it is harder to understand the complexity of these algebraic problems, and they are more likely to yield algorithmic surprises.

I do not know of a good way to formally classify computational tasks into *combinatorial/unstructured* vs. *algebraic/structured* ones, but in the rest of this essay I try to use some examples to get a better sense of the two sides of this divide. The observations below are not novel, though I am not aware of explicit expositions of such a classification (and would appreciate any pointers, as well as any other questions or critique). As argued below, more study into these questions would be of significant interest, in particular for cryptography and average-case complexity.

** **

## Combinatorial/Unstructured problems

The canonical example of an unstructured combinatorial problem is SAT— the task of determining, given a Boolean formula in variables with the operators , whether there exists an assignment to the variables that makes true. SAT is an NP-complete problem, which means it cannot be solved efficiently unless PNP. In fact, the Exponential Time Hypothesis posits that every algorithm solving SAT must take at least time for some . SAT illustrates the above dichotomy in the sense that its natural restrictions are either as hard as the general, or become easily easily solvable, as in the case of the 2SAT problem (where the formula is in conjunctive normal form with each clause of arity ) that can be solved efficiently via a simple propagation algorithm. This observation applies much more generally than SAT . In particular the widely believed Feder-Vardi dichotomy conjecture states that every constraint satisfaction problem (CSP) is either NP hard or in P. In fact, researchers conjecture (and have partially confirmed) the stronger statement that every CSP can either be solved by some specific algorithms of low polynomial-time (either propagation or generalizations of Gaussian elimination) or is NP hard via a linear blowup reduction from SAT , and hence (under the Exponential Time Hypothesis) cannot be solved faster than time for some .^{3}

Random SAT formulas also display a similar type of dichotomy. Recent research into random –SAT (based also on tools from statistical physics) suggests that they have multiple *thresholds* where the problem changes its nature. When the *density* (i.e., ratio of constraints to variables) of the formula is larger than some number (equal roughly to ) then with high probability the formula is “overconstrained” and no satisfying assignment exists. There is some number (equal roughly to ), such that for , the space of satisfying assignments for a random formula looks roughly like a discrete ball, and, due to this relatively simple geometry some local-search type algorithms can succeed in finding satisfying assignments. However for , satisfying assignments still exist, but the geometry of the solution space becomes vastly different, as it *shatters* into exponentially many clusters, each such cluster separated from the others by a sea of assignments that violate a large number of the constraints, see Figure 1. In this regime no efficient algorithm is known to find the satisfying assignment, and it is possible that this is inherently hard.^{4}

Figure 1: An illustration of the solution space geometry of a random SAT formula, where each point corresponds to an assignment with height being the number of constraints violated by the assignment. The left figure depicts the “ball” regime, where a satisfying assignment can be found at the bottom of a smooth “valley” and hence local algorithms will quickly converge to it. The right figure depicts the “shattered” regime where the surface is very ragged, with an exponential number of crevices and local optima, thus local algorithms (and as far as we know any algorithm) will likely fail to find a satisfying assignment. Figures courtesy of Amin Coja-Oghlan. |

Dichotomy means that when combinatorial problems are hard, then they are typically very hard, not just in the sense of not having a subexponential algorithm, but they also can’t be solved non-trivially in some intermediate computational models that are stronger than P but cannot solve all of NP such as quantum computers, statistical zero knowledge, and others. In particular for combinatorial problems (quoting Lovász) the existence of a *good characterization* (i.e., the ability to efficiently *verify* both the existence and non-existence of a solution) goes hand-in-hand with the existence of a good algorithm. Using complexity jargon, in the realm of combinatorial optimization it seems to hold that PNPcoNP, even though we believe this is false in general. Indeed, for many combinatorial problems such as matching, max flow, planarity, etc.. demonstrating a good characterization is an important step toward finding an efficient algorithm. This is related to the notion of *duality* in convex programming, which is often the method of choice to solve such problems.

Combinatorial problems can be quite useful for *cryptography*. It is possible to obtain one-way functions from random instances of combinatorial problems such as SAT and CLIQUE. Moreover, the problem of attacking a cryptographic primitive such as a block cipher or a hash function can itself be considered a combinatorial problem (and indeed this connection was used for cryptanalysis). However, these are all *private key* cryptographic schemes, and do not allow two parties to communicate securely without first exchanging a secret key. For the latter task we need public key cryptography, and as we discuss below, the currently known and well-studied public key encryption schemes all rely on *algebraic* computational problems.

** Algebraic/Structured problems **

FACTORING is a great example of an algebraic problem; this is the task of finding, given an -bit integer , the prime numbers such that . No polynomial time algorithm is known for FACTORING , but it had seen some non-trivial algorithmic advances. While the natural trial-division algorithm takes roughly steps to solve FACTORING , the Number Field Sieve algorithm, which is the current best, takes roughly steps. FACTORING can also be solved in polynomial-time on *quantum computers* using Shor’s Algorithm. Finally, FACTORING (or more accurately, the decision problem obtained by looking at individual bits of the output) is also in the class NPcoNP, which means that one can efficiently *verify* the value of a particular bit of the answer, no matter if this value is zero or one. These results almost certainly mean that FACTORING is *not* NP complete.

There is another, more subjective sense, in which I find FACTORING different from SAT . I personally would be much more shocked by a -time algorithm for SAT than by a -time algorithm for FACTORING . The reason is that, while people have found clever ways to speed up the time exhaustive search algorithm for SAT (especially on certain types of instances), these approaches all seem to inherently require exponential time, and are not as qualitatively different from exhaustive search in the way that the number field sieve is different from trial division. In contrast, FACTORING clearly has strong algebraic structure that we do not completely understand, and perhaps have not reached the limit of its exploitation by algorithms. To see that this is not completely implausible, consider the problem of computing the discrete logarithm in fields of small characteristic. This problem shares many properties with FACTORING , and it also shared the property of having a best-known running time of until this was recently improved to and then to .

Not all algebraic problems are hard. Factoring univariate polynomials over finite fields can be solved efficiently using the Berlekamp or Cantor-Zassenhaus algorithms (see e.g. chapter 21 here). This algorithm also exemplifies the statement above, that algorithms for algebraic problems are often very specialized and use non-trivial properties of the problem’s structure. For this reason, it’s harder to predict with confidence what is the best algorithm for a given algebraic problems, and over the years we have seen several surprising algorithms for such problems, including, for example, the fast matrix multiplication algorithms, the non-trivial factoring algorithms and deterministic primality testing, as well as the new algorithm for discrete logarithm over small-characteristic fields mentioned above.

**Relation to cryptography.** Algebraic problems are very related to *public key cryptography*. The most widely used public key cryptosystem is RSA, whose security relies on the hardness of FACTORING . The current subexponential algorithms for FACTORING are the reason why we use RSA keys of or bits, even though even the yet-to-built exaflop supercomputers would take thousands of years to perform, say, computational operations. This also demonstrates how fragile is RSA to any surprising algorithmic advances. If the exponent of the best factoring algorithm would halve (i.e., change from to ) then, roughly speaking, to get equivalent security we would need to *square* the size of the key. Since the RSA encryption and decryption algorithms take time which is at least quadratic in the size of the key, that would make RSA pretty impractical.

Cryptosystems based on the discrete logarithm problem in *elliptic curves* yield one alternative to RSA which currently is not known to be broken in subexponential time. Elliptic-curve discrete log is of course also very much an algebraically structured problem, and so, I would argue, one in which further algorithmic surprises are hard to rule out. Moreover, like factoring, this problem can be solved in polynomial time by *quantum computers*, using Shor’s algorithm.

The only other public key cryptosystems that are researched enough to have some confidence in their security are based on decoding problems for linear codes or integer lattices. These problems are not known to have subexponential algorithms, classical or quantum. Moreover, some variants of these problems are actually *NP-hard*. Specifically, theses problem are parameterized by a number which is the *approximation factor*, where smaller means the problem is harder. For example, the shortest vector problem in a lattice can be solved efficiently for (where is some constant and is the dimension of the lattice, which is related to the length of the input), and the problem is NP hard for (where is some function of tending slowly to zero). For this reason lattice problems were once seen as a potential approach to getting both private and public crypto based on the minimal assumption that PNP, which in particular would yield public key crypto based on unstructured problems such as SAT . However, we only know how to get public key crypto from these problems for for some while we have reason to believe that for the problem *does* actually possess algebraic (or at least geometric) structure; this is because in this range the problem has a `good characterization'' (i.e., in NPcoNP or AMcoAM). A similar phenomenon also occurs for other problems such as learning parity with noise and random 3SAT (see discussion in <a href="http://www.eng.tau.ac.il/~bennyap/pubs/ncpkcFull1.pdf">this paper with Applebaum and Wigderson</a>)--- there seem to be two <em>thresholds</em> such that for the problem is hard and arguably unstructured, for the problem becomes useful for public key cryptography, but also seems to suddenly obtain some <em>structure</em> such as a`

good characterization”, while for the problem becomes easy. Another sign of potential structure in lattice problems is the existence of a subexponential quantum algorithm for the hidden subgroup problem in dihedral groups, which is related to these problems.

The bottom line is that based on the currently well studied schemes, structure is strongly associated with (and perhaps even implied by) public key cryptography. This is troubling news, since it makes public key crypto somewhat of an “endangered species” that could be wiped out by a surprising algorithmic advance. Therefore the question of whether structure is *inherently necessary* for public key crypto is not only of mathematical interest but also of practical importance as well. Cryptography is not just an application of this classification but also provides a useful lens on it. The distinction between private key and public key crypto mirrors the distinction between unstructured and structured problems. In the *private key* world, there are many different constructions of (based on current knowledge) apparently secure cryptosystems; in fact, one may conjecture (as was done by Gowers) that if we just combined a large enough number of random reversible local operations then we would obtain a secure block cipher. In contrast, for *public key* cryptography, finding a construction that strikes the right balance between structure and hardness is a very hard task, worthy of a Turing award, and we still only know of a handful or so such constructions.

** A different approach to average case complexity **

I am particularly interested in this classification in the context of *average-case complexity*. In the case of *worst-case* complexity, while we have not yet managed to prove that PNP, complexity theorists achieved something like the next best thing— classifying a large number of problems into hard and easy ones based on this single assumption. We have not been able to replicate this success in average case complexity, and there is a good reason for that. Our main tool for basing one assumption on another one— the *reduction*— is extremely problematic in average case complexity, since there are inherent reasons why a reduction would not preserve the distribution of the inputs. To illustrate this, suppose that we tried to show that an average-case problem is no harder than an average-case problem using a standard Karp reduction (i.e., is a function mapping an -input into a -input such that ). For simplicity, assume that the input distribution for both problems is the uniform distribution. This would imply that for a random , should be distributed close to the uniform distribution over . But we cannot expect this to happen in any reasonable reduction, as all of them add gadgets or blow up the size of the instance in some way, meaning that , in which case is distributed over a subset of of size less than and hence is far from the uniform distribution.^{5}

This difficulty is one reason why the theory of average-case complexity is much less developed than the theory for worst-case complexity, even though average-case complexity is much more relevant for many applications. The observations above suggest that at least for combinatorial problems, we might hope for a different approach: define a *meta conjecture* that stipulates that for a whole class of average-case problems, a certain algorithmic framework yields the optimal efficient algorithm, meaning that beating the performance of that algorithm would be infeasible (e.g., take exponential time). To make things more concrete, consider the following hypothesis from a paper with Kindler and Steurer:

Random CSP Hypothesis.For every predicate , if we let be the problem of estimating the fraction of constraints that can be satisfied for an instance chosen at random, then no efficient algorithm can obtain a better approximation to than , where is the approximation obtained by the canonical semidefinite program (a type of convex relaxation) to this problem.^{6}

Note that this is a much more general conjecture that P NP, which can be reduced to the statement that a single problem (say worst-case SAT ) cannot be efficiently solved. In contrast, the Random CSP Hypothesis contains an unbounded number of hardness conjectures (one for every predicate) that (except in very special cases) are not known to be reducible to one another. Of course, to derive a concrete assumption about a predicate from this hypothesis one needs to calculate , but fortunately for random CSP’s this can be done easily— one can of course run the algorithm, but there is also an analytical expression for this quantity.

Despite it being such a general hypothesis, I don’t think the Random CSP Hypothesis is yet general enough— there may well be significant extensions to this hypothesis that are still true, involving combinatorial problems different than CSP’s, and distributions different than the uniform one. Perhaps with time, researchers will find the `right'' meta conjecture which will capture a large fraction of the problems we consider`

combinatorial”.

At first brush, it might seem that I’m suggesting to trivialize research in average-case complexity by simply assuming all the hardness results we wish for. But of course, there is still a very real challenge to find out if these assumptions are actually true! Given our current state of knowledge, I don’t foresee an unconditional proof of these types of assumptions, or even a reduction to a single problem, any time soon. But as I discussed before this doesn’t mean we can’t gather *evidence* on these meta assumptions. In particular, such assumptions form very `fat targets'' for potential refutations. For example, all we have to do to refute the Random 3CSP Hypothesis is to find a single predicate and a single efficient algorithm such that gives a better approximation factor than for . In fact, there are very natural candidate algorithms to do just that, including in particular more complicated convex programs known as semidefinite programming <em>hierarchies</em>. Analyzing the performance of such algorithms raises some fascinating mathematical questions, many of which we haven't yet been able to solve, and this is a very interesting research area in its own right. With effort and time, if no refutation is found, we might gain confidence in the veracity of such meta assumptions, and obtain a much clearer view of the landscape of average-case complexity, and complexity at large.`

markers of structure” such as subexponential algorithms, quantum algorithms, good characterization, usefulness for public key cryptography, etc.. need always go together would be extremely useful for many applications, and in particular cryptography. Even more speculatively, perhaps thinking about these issues can help towards the goal of

<h2><b> Conclusions </b></h2>

While much of what I discussed consists of anecdotal examples, I believe that some works, such as those related to the Feder-Vardi conjecture or to phase transitions in random CSP's, offer a glimpse of a potential general theory of the complexity of combinatorial problems. I think there is room for some ambitious conjectures to try to illuminate this area. Some of these conjectures might turn out to be false, but we can learn a lot from exploring them. Understanding whether the*unconditional* results. The richness of the space of algorithms is one of the main “excuses” offered for our relatively little success in proving unconditional lower bounds. If indeed this space is much more limited for combinatorial problems, perhaps this can help in finding such proofs.^{7}

*Thanks to Scott Aaronson, Dimitris Achlioptas, Amin Coja-Oghlan, Tim Gowers, and David Steurer for useful comments and discussions.
*

**Footnotes**

^{1} The standard definition of `convexity'' of the solution space of some problem only applies to continuous problems and means that any weighted average of two solutions is also a solution. However, I use`

convexity” here in a broad sense meaning having some non-trivial ways to combine several (full or partial) solutions to create another solution; for example having a matroid structure, or what’s known as `polymorphisms''</a> in the constraint-satisfaction literature.} This phenomenon is also related to the`

unreasonable effectiveness” of the notion of NP-completeness in classifying the complexity of thousands of problems arising from dozens of fields. While a priori you would expect problems in the class NP (i.e., those whose solution can be efficiently certified) to have all types of complexities, for natural problems it is often the case that they are either in P (i.e., efficiently solveable) or are NP-hard (i.e., as hard as any other problem in NP, which often means complexity of , or at least . ↺

^{2} Of course even if the algorithm is simple, analyzing it can be quite challenging, and actually obtaining the fastest algorithm, as opposed to simply one that runs in polynomial time, often requires additional highly non-trivial ideas. ↺

^{3} The main stumbling block for completing the proof is dealing with those CSPs that require a Gaussian-elimination type algorithm to solve; one can make the argument that those CSP’s actually belong to the algebraic side of our classification, further demonstrating that obtaining precise definitions of these notions is still a work in progress. Depending on how it will be resolved, the Unique Games Conjecture, which I discussed here, might also give rise to CSP’s with “intermediate complexity” in the realm of *approximation algorithms*. Interestingly both these issues go away when considering random, noisy, CSP’s, as in this case solving linear equations becomes hard, and solving Unique Games becomes easy. ↺

^{4} The Survey Propagation Algorithm is a very interesting algorithm that arose from statistical physics intuition, and is experimentally better than other algorithms at solving random –SAT formulas for small such . However, it is believed, that at least for larger , it too cannot succeed in the regime where the solution space geometry shatters. The best known algorithm for random –SAT for large is given in this paper of Amin Coja-Oghlan. ↺

^{5} As further argument that reductions should increase the input length, note that if and were shown equivalent by reductions and that *shrink* the size of the input even by a single bit, then by repeating these reductions recursively shows that both and can be solved in polynomial time. This argument can be extended to the case that and are length preserving, under the assumption that is not too close to the identity permutation and that the inputs of length are embedded in the set of inputs of length . One can also use similar arguments to rule out certain types of *probabilistic* reductions, even those that increase the input size, if we assume the reduction is efficiently invertible. ↺

^{6} The notion of `chosen at random'' roughly corresponds to the uniform distribution over inputs, or the uniform distribution with an appropriately`

planted” satisfying assignment, with the precise notion of “estimation” being the appropriate one for these different models; see the paper for details. The Random CSP Hypothesis deals with the *overconstrainted* regime of random SAT formulas, as opposed to the *underconstrained* regime in discussed above in the the context of phase transitions. ↺

^{7} In some sense, such an approach to proving lower bounds is dual to Mulumley’s approach of `Geometric Complexity Theory'' (GCT). (For more information about GCT, see <a href="http://intractability.princeton.edu/blog/2009/12/geometric-complexity-theory-workshop/">the talks in this workshop</a>, and also <a href="http://cstheory.stackexchange.com/questions/17610/wikipedia-style-explanation-of-geometric-complexity-theory">this StackExchange answer</a>, <a href="http://math-www.uni-paderborn.de/agpb/vortrag/pbuerg-CCC12-static-1.pdf">this presentation</a> and <a href="http://www.cs.toronto.edu/~jgrochow/grochow-gct-unity.pdf">this paper</a> ). The GCT approach attempts to use specific properties of structured functions such as the permanent to obtain a lower bound; these properties are actually`

constructive” in the Razborov-Rudich sense of Natural Proofs. If we focused on combinatorial, “unstructured”, problems then we would need to come up with general properties guaranteeing hardness, that would also apply to random functions (which are the ultimate unstructured functions). The Razborov-Rudich result implies such properties would be inherently non-constructive. Valiant’s approach for proving certain types of lower bounds via *Matrix Rigidity* can be thought of as an instance of the latter approach. ↺

### Trackbacks

Comments are closed.

A very nice post. The hypothesis that good characterizations of a combinatorial optimization problem and its complement yield a good algorithm, which we would write as P=NP intersect coNP and which you seem to attribute to a quote of Lovasz actually was made by Jack Edmonds in his famous Paths, Trees, and Flowers paper in 1965. It was motivated by LP duality, but was made even before we knew that solving LP’s is in P.

Thanks Paul. I don’t attribute it to Lovasz, it’s just the one place I found an explicit quote (in his Combinatorial Problems book he says that “The existence of a good characterization tends to go hand in hand with the existence of good decision algorithms”).

I’ve heard that the notion of NP cap coNP was defined by Edmonds, but I don’t remember a discussion on it in the Paths, Trees and Flowers paper (there is an excellent discussion of the notion of “polynomial time” there) – do you know of a good reference for this discussion?

You are right. It isn’t there. What I saw was in some later work referring to Edmonds early papers and as I look at them it is pretty hard to find anything directly there. I know that I saw this conjecture explicitly in Edmonds work but it may have been later.

Great essay!

To focus on one of the issues raised, let me ask a very basic question. What’s the best result known for simulating an *arbitrary* quantum algorithm by a classical computer? If a problem in NP is known to be solvable in polynomial time by a quantum machine, is it even known whether it’s solvable in 2^{cn} time for some c<1 with a classical randomized algorithm?

That’s a good question – I don’t know the answer. While there are non-trivial simulations in terms of space (using polynomial space as opposed to the naive way which would keep track of all amplitude), I don’t know if there is any non-trivial exponential speedup.

A great post! I have a small remark regarding footnote 3: the power of a Gaussian-elimination type algorithm (called few subpowers, published by Idziak et al. in SICOMP 39(7):3023-3037, 2010) is known, see Berman et al. Trans. AMS 362(3):1445-1473, 2010, and it’s now clear that it won’t solve all remaining (open) CSPs. Few known classes combine the two techniques, few subpowers and consistency techniques, but it’s not clear that (a combination of) these two will be enough for the remaining CSPs that are conjectured to be tractable. It may need a completely new algorithmic approach!

A great post! I have a small remark regarding footnote 3: the power of a

Gaussian-elimination type algorithm (called few subpowers, published by Idziak

et al. in SICOMP 39(7):3023-3037, 2010) is known, see Berman et al. Trans. AMS

362(3):1445-1473, 2010, and it’s now clear that it won’t solve all remaining

(open) CSPs. Few known classes combine the two techniques, few subpowers and

consistency techniques, but it’s not clear that (a combination of) these two

will be enough for the remaining CSPs that are conjectured to be tractable. It

may need a completely new algorithmic approach!

Thank you! I am not really an expert on this literature – I seem to have a memory that the problematic CSP’s are those who can count, and hence require some gaussian elimination type algorithm, but indeed perhaps it’s not enough. Are there good references for this discussion of what algorithmic approaches would be needed to resolve it?

its great you cited transition point research into SAT, not everyone in complexity theory is aware of it. think it could be a rosetta stone for unlocking P≠NP, have studied it closely & think have somewhat recently pinpointed the way it interferes with Razborovs monotone circuit proofs, just looking for some expert [cyberspatial?] collaborators on that project/research program/direction/attack angle here…. also great minds think alike, scott aaronson just posted a high voted question on tcs.se along similar lines/theme of this post, overarching reasons why problems are in P

Indeed, I’ve linked to Scott’s question above.

A talk by Peter Winkler today reminded me I should have definitely also mentioned Allan Sly’s paper, that also gives some evidence as to how the threshold when the solution space geometry changes might be the same threshold as when the problem becomes computationally hard (this time in the context of the independent set problem).

Another property where combinatorial and algebraic problems seem to differ is

robustness to noise. Combinatorial problems, perhaps because the algorithms are message passing or convex optimization, often seem to have similar difficulty when noise is added, while algebraic problems tend to be far more brittle, often becoming of very different complexity.A very informative post. Thanks.

Thanks for the very interesting post!

It further encourages me to pursue hash-based digital signatures (http://pqcrypto.org/hash.html), which implement the “digital signature” operation (prove that a string was produced using knowledge of a specific secret, without revealing that secret) assuming only secure hash functions. Secure hash functions are eminently “unstructured” things. They are merely “bit blenders”.

I spent a little time trying to imagine how to get the “public key encryption” operation (encrypt so that only the holder of a specific private key can decrypt) from only secure hash functions, but didn’t get very far. David Molnar once patiently explained to me some reason why such a construction is implausible, but I didn’t understand his explanation. Such a construction would prove that we don’t live in Impagliazzo’s Minicrypt, I guess.

Merkle’s Puzzles (https://en.wikipedia.org/wiki/Merkle%27s_Puzzles) could be built solely out of secure hash functions, and they constitute public key encryption, but they give the interlocutors only a quadratic advantage over the eavesdropper. Does that mean anything about whether we live in Minicrypt?

In any case, it is interesting that of the two operations that are currently done with “structured” systems like RSA and Discrete Log — public key encryption and digital signature — one of them can be replaced by a purely “unstructured” alternative, and the other one can’t (unless quadratic advantage is good enough for you).

Impagliazzo and Rudich proved that any construction that uses (even an ideal) hash function as a black box can give at most (roughly) security, Mahmoody and I improved this bound to , thus proving that Merkle’s puzzle construction is optimal. Of course, this does not rule out non-black-box constructions.

Thanks! In what paper did you prove this n² bound?

The paper with Mahmoody is “Merkle Puzzles are Optimal”, CRYPTO 2009. It can be found at the following link:

BTW I have another paper with Mahmoody showing that any black box construction of signature schemes from hash functions will have to make calls to the hash function where is the security parameter. See

http://arxiv.org/abs/0801.3680

p.s. apologies to all commenters for the delay in moderation – for some reason I wasn’t getting the notification emails.

Beautiful essay!

A few comments:

1. You say: “In this regime no efficient algorithm is known to find the satisfying assignment, and it is possible that this is inherently hard.” This is true, but the only evidence we have is that in this reason search-based algorithms are inefficient. Our search algorithms are all based on resolution, which is a very weak proof system. Thus, there is no serious evidence that this region is inherent hard. It is also possible that this region is easy.

2. There is a class of problems in NP\cap co-NP for which we do not know any efficient algorithm. These include parity games, mean-payoff games, and stochastic games. Should we therefore assume that these problems are not in NP?

3. I think the evidence that CSP has a dichotomy is much stronger than the evidence that the theory of phase transitions offer a general approach to the complexity of combinatorial problems. Furthermore, low-density random problems are easy, even though they have no structure.

Anyway, these are just quibble.

Moshe

Thank you so much Moshe!

I agree the evidence for hardness in this region is so far very limited, and probably only a cryptographer would use it as a basis to conjecture hardness 🙂

I didn’t completely get your comment no 2 (am guessing you meant P and not NP in the end, right?). This class of problems is very interesting, and probably some of them are hard and some are easy. I would call problems in this class at least somewhat structured, and say that it’s far more challenging to get an understanding of their complexity (though notions such as PPAD-completeness and SZK-completeness are a promising direction).

I agree the evidence for CSP dichotomy is much stronger – many parts of your conjecture have already been proven and advances continue to be made. In contrast, it is not even clear what would be compelling evidence that phase transitions correspond to computational hardness (this is related to my previous post) though lower bounds for local search algorithms and hardness of approximation results such as Sly’s can be viewed as initial progress in this direction. As a side note, the easiness in the low density regime (where the solution space looks like the left side of Figure 1) corresponds to my intuition that unstructured problems are either very easy or very hard.

What I meant by comment 2 is that we have some problems in NP\cap co-NP that so far have resisted all attacks. They have structure, but I’d not rush to conjecture that they are in P. (Sorry for the typo.)

I did not mean to imply that structured problems are in P. In contrast, I think structured problems might be exactly those whose complexity is hard to determine and may be somewhere intermediate.

I just say for those problems, I am a bit more uncomfortable to conjecture they are not in P, especially if they have seen several advances in non-trivial algorithms and other non trivial structural results (most prominent example here being graph isomorphism).

We are in agreement.