On the Importance and Risks in Using Sub-Reviewers
I promised to post from time to time about the FOCS 2013 PC work (to demystify the process). So here is a quick update: We got 280 submission (well, 281 submissions but one was just a bad joke). This is up (by more than %10) from FOCS 2012 but on the other hand our PC is also on the large side so the number of papers assigned to a PC member is less than 40, which for FOCS/STOC is very reasonable (I reduced this number a bit by encouraging several authors of misplaced papers to withdraw their submissions).
The first stage of the work was assigning each paper to three PC members (more challenging than I expected), and then the individual review stage of the PC work. In this stage, each PC member reviews his/her assigned papers without discussion with other PC members. This way, each paper gets three relatively independent reviews. We are now ending this stage, and it is a good opportunity to talk about using sub-reviewers.
A PC member can ask an expert in the specific sub-area of a paper for a review (and we then refer to such an expert as a sub-reviewer). The purpose of sub-reviewers is not to lighten the load of PC members (though this could be seen as a positive byproduct), but rather to elicit expertise the PC is missing. One disadvantage of sub-reviewers which I will not consider here is the additional work for the rest of the community. This is a consideration, but hopefully an expert would be interested in reviewing papers that are very close to his/her expertise (and this would not be too much work).
It is my experience that a major source of mistakes for PCs is in the way they use external experts. One fear is an over-confident PC that believes it can judge every work on its own. True enough “in the good old days,” PCs were self-sufficient (especially in the pre-Internet era). But TOC today is much deeper and much wider than “in the good old days.” It is easy to make mistakes if you do not have good understanding of related work. On the other hand, PCs that rely on their sub-reviewers too much are in just a serious problem. The risk is to turn FOCS/STOC into a collection of mini-conferences run by separate sub-communities. This is completely opposite to the purpose of FOCS/STOC: promoting TOC as a joint community and fostering the flow of ideas. For that to happen we must have that papers accepted are of interest to larger fractions of the community. The ideal solution in my eye is to heavily rely on external experts, but to view these reviews as inputs to the PC work rather than as outsourcing of the PC work. I therefore asked each PC member to familiarize himself/herself with its assigned papers and to reach a separate opinion. I know this is an ideal view, but setting the right ideals could be important.
On the importance of the alphabet
In my last post, we saw that the problem of learning juntas, hard as it is over Boolean inputs, seems even worse over other alphabets. Coding theory happens to have a inexhaustible supply of such problems. Some of these are long-standing open problems, others are of a more recent vintage. More of these problems seem to crop up all the time. And as will be clear from the list below, this issue does seem to follow me around.
Here are four of my favorite problems, completely solved for but wide open for any other alphabet (which we take to be some finite field).
Optimal codes of distance d: Arguably the mother of all such problems. How many parity checks must a ary length
code of distance
have? Here we think of
as a constant and
going to infinity. For
the Hamming bound tells us that
parity checks are necessary. The BCH construction tells us that this is also sufficient. For larger alphabets, the best lower bound is still the Hamming bound, and the Hamming bound is still
. Alas the BCH upper bound is now
. For more on this problem and its connections to additive combinatorics, see here.
List-decoding Reed-Muller codes: Given a function , our goal is to recover all degree-d polynomials that are close to
. Of course, we can hope to solve this in reasonable time only if this list is not too large (say a constant independent of the dimension
). The question is what bound on the distance from
(equivalently the error-rate) is enough to ensure that the list is small, the conjecture is that it is the minimum distance of the degree d Reed-Muller code. The degree 1 case is the problem of Hadamard decoding, solved by Goldreich and Levin. In joint work with Klivans and Zuckerman, we solved the
case for all
, and in a subsequent paper, the
case for all
. The case of larger
is open.
The bottleneck in working with higher alphabets is the following. Fix a polynomial P which has good agreement. The Goldreich-Levin and GKZ algorithms “guess” the P on a c-dimensional subspace S, where is a constant. They then use this to recover P on all
dimensional subspaces containing S. Why is this easier? Knowing the right values on a subspace of co-dimension 1 effectively reduces the error-rate by a factor of
. For
, this reduces the error by
and puts us in the unique-decoding regime where things are easy. But for larger
, this error-reduction does not always suffice to specify P unambiguously.
Decoding from errors and erasures: Assume that we have an linear code. Such a code can correct
errors and
erasures. Assume that we have efficient algorithms for both these (the latter is always the case for a linear code). Can we combine these in a black-box fashion to get an efficient algorithm that corrects
errors and
erasures as long as
? When
, the answer is Yes (take a guess!). I don’t know the answer for larger
. I came upon this problem in conversations with Atri Rudra, perhaps there is a reference for it? Or better still, a solution?
Learning Linear Functions with noise: Learning parity with noise is a central problem in learning, that is widely believed to be hard. We are given random examples of points in labeled with some unknown parity function L, the catch is that each label is flipped independently with probability
. A seemingly even harder problem is when the noise is adversarial, and the adversary is allowed to flip an arbitrary
fraction of labels over all points in
. In joint work with Feldman, Khot and Ponnuswami, we show that the problems are in fact equivalent, there is a reduction from adversarial noise to random noise.
In retrospect, perhaps this is not too surprising: learning parity with noise is already so very hard that making the noise adversarial does not make it worse. Well, the same philosophy ought to apply to other alphabets. But I don’t know of a similar reduction for any other alphabet. This paper makes some progress on this problem, reducing adversarial noise to a more benign noise model, but does not solve the problem fully.
If you have a problem of this flavor, I’d love to hear about it. I am unlikely have anything insightful to say, but I’d be happy to commiserate with you about how annoying can be.
Reasons to care: In honor of Scott Aaronson
Update (5/7): This post earned me a spot on the not-so-exclusive club of people called names such as a “narrow-minded” “biased” “religious worshiper” “who doesn’t want to learn something difficult and new” by Luboš Motl. Interestingly, he mostly takes issue with my discounting the possibility that the complexity of SAT is something like or
. Although I do believe such an algorithm is extremely unlikely, its discovery wouldn’t be a bad thing to computational complexity. On the contrary, I imagine that if such an algorithm is discovered then investigating whether it can be improved and implemented would be seen as meriting the investment of a huge amount of resources, possibly larger than those given for the Large Hadron Collider.
[ First, an announcement. Mark Braverman and Bernard Chazzelle are organizing a workshop on May 20-21 on Natural Algorithms and the Sciences , bringing together researchers from computer science, mathematics, physics, biology, and engineering to explore interactions among algorithms, dynamical systems, statistical physics, and complexity theory (in all senses of the term) ]
In some very pleasant though unsurprising news, Scott Aaronson got tenure at MIT. Among his many talents and activities, Scott has done a fantastic amount of outreach and exposition, explaining the concepts of theoretical computer science and quantum computing to other scientists and interested laymen. He also spends considerable energy trying to dispel wrong opinions with amazing patience and resilience (apparently due to his immutable psychological makeup). So, in honor of Scott getting tenure, just after a bruising flamewar with various “interesting characters” he managed to find in some dark corner of the Internet, I thought I’d step up to help him along and try to do my share of dispelling some common myths about complexity and particularly the vs
problem.
P vs. NP: Reasons to Care
The question (and its close variants) is the central question of computer science. In fact, its hard to think of another question in any field that combines such mathematical depth and beauty with great importance to understanding nature. Scott did a pretty good job explaining why
is true. In this brief post I’ll try to outline why I and many others are so intrigued by this question, by trying to answer various objections that are commonly made against its importance. Some of these issues are mentioned in my book with Arora but I thought a more explicit discussion might be worthwhile.
What if
=
but the algorithm for SAT takes
or
time?
In other words, what if SAT has exponential complexity for all inputs up to size for some huge number
, and only starts becoming comparatively easier after that number. This is the equivalent of asking what if the laws of nature we’ve learned so far are true only in the corner of the universe we’ve managed to observe, and are completely different in other areas. Yes, this is a logical possibility, but we have not seen the remotest reason to believe it. Also, in recent years we’ve been exploring the computational universe (in terms of the huge growth of input sizes and computational resources) at a much faster rate than the physical one, and as we do so, it only becomes more important to understand the asymptotic behavior of algorithms. Having said that, I agree that after proving
, the next order of business would be to get a more precise estimate of the number of computational operations needed to solve length-
instances of SAT as a function of
.
The
question is irrelevant since
is only a mathematical model having nothing to do with efficient computation.
The universality of computation, leading to equivalent models whether you study Turing machines, cellular automatons, electronic computers, mathematical deduction systems, or computation by natural systems (see workshop above) or economic markets, has been a surprising and central phenomena, validating the formal study of Turing machines or Boolean circuits. But I do agree is an imperfect model for efficient computation. Yet it is still a fine tool for the question of studying the complexity of SAT or other NP-complete problems. It’s true that
is often too permissive a model, allowing algorithms that are prohibitively expensive to run on today’s technology, but this is not as much of an issue when the question is whether you can beat the exponential-time brute force algorithm. (Indeed, it has been often observed that natural problems either have fairly efficient algorithms with
or
running time, or seem to have exponential (i.e.,
) complexity.)
Perhaps a more serious concern is that may be too restrictive, since it does not allow taking advantage of quantum mechanics (or maybe other physical theories), and is also only concerned with worst case computation. While those concerns are valid in principle, it seems that while quantum computers can do wonders for certain structured problems such integer factoring, their effect on NP-hard problems such as SAT is fairly limited. Also, while it is possible to posit a theoretical world in which
but every
problem is easy to solve on the average, the evidence so far strongly points to that not being the case. So, while one can argue that instead of
we should be asking what’s the size of the smallest quantum circuit that solves SAT on the average case, all evidence points out to these two questions having the same answer. Moreover, at the moment we’re so far from resolving either question that the tools we have to study all those variants are the same. When we start getting close to
, we might want to try to understand the more refined variants such as quantum or average-case complexity.
is not a fundamental property of nature such as the physical laws.
“Fundamental” is a subjective term, and can be used in different senses. The fundamental theorem of algebra can be (and is) derived from more basic axioms, and so this theorem is fundamental not because it’s the foundation for all algebraic facts but because it is central and important. One can make a similar analogy between “high level” laws of nature, positing the impossibility of perpetual motion machines or devices that transmit information faster than light, and the laws governing behavior of elementary particles that may be used to derive them. , positing (essentially) the impossibility of a machine that gets a SAT formula and spits out a satisfying assignment, is comparable to these high level laws in flavor, and I would say also in importance.
is only about practical efficiency whereas the deeper / more fundamental question is which computational problems can be solved at all.
I don’t think anyone that actually studied complexity for even a short while would share this objection. To paraphrase Edmonds (whom we quote in our book), in many contexts the difference between polynomial and exponential is more significant than the difference between exponential and infinite. An example we give in the book is Arthur Ganson’s kinetic sculpture Machine with Concrete. It’s composed of a sequence of 12 gears, each rotating 50 times slower than the previous one. While the first gear rotates 200 times per minute, the last one, at that speed, is to all practical purposes stationary, and indeed is embedded in a block of concrete. So, exponentially hard problems are essentially unsolvable. But it turns out that in most cases the problems we want to solve (from simulating the physical world to finding if there is an input making our program take more than
units of time) are trivially computable, so in such cases the interesting question is whether their complexity is polynomial or exponential.
A related concern is that only involves “elementary” mathematics, as opposed to “deep” questions such as, say, the Riemann hypothesis. While the terms “deep” and “elementary” are not well defined (it seems that, similar to the AI effect, some people consider every problem or result in fields they don’t like as “shallow”), I believe it is again the case that this is not a concern shared by people that actually studied computational complexity. While this field is very young, it already contains several deep and beautiful results, and it is clear that many more will be needed for us to approach resolving
.
Resolving
is not interesting, since we already know the answer is that
.
I can sympathize with this objection, as I too firmly believe that and in fact I believe that concrete problems such as, say, CIRCUIT-SAT cannot be solved in subexponential time. The reason we are looking for the proof is not just to get an extra level of confirmation to what we already know, but because we believe coming up with such a proof will necessarily have to yield new deep insights about the nature of computation. It is impossible to predict what would be the nature of such insights or what would be the extent of their applications; but in particular a better understanding of the dividing line between easy and hard problems can yield on the one hand new algorithms, and on the other hand new encryption schemes that are provably impossible to break.
Even if
=
, we wouldn’t necessarily be able to automate creativity, find hard proofs, beautify symphonies etc..
Some people making this objection are really thinking of the first objection, namely that but the algorithm is so inefficient that it still holds that
in the “observable computational universe”. Others note that it’s not immediately clear how you use an algorithm for SAT to do things such as perfect artificial intelligence, weather prediction, etc… Lance Fortnow has a nice description in his book of a universe in which
(see also Impagliazzo’s survey). Indeed, given the vast amount of NP-complete problems from all science, combined with the ability (if
) to solve even problems higher in the polynomial hierarchy, I think it’s hard to overestimate the implications of an efficient SAT algorithm. But I still do not see achieving this ideal world as the main reason to study
. As I said, even if we don’t have a formal proof, there are plenty of reasons to believe that
. So, I view the fact we can not rule out this hypothetical computational utopia not as a sign that it might be true but rather as demonstrating how much is still left for us to discover in the world of computation.
The New Yorker on P vs NP
A new review is out for Lance Fortnow’s new book “The Golden Ticket: P, NP and the Search for the Impossible“.
In another piece of news: congratulations to new members of the National Academy of Science Éva Tardos and Avi Wigderson!
History Repeats Itself in the Notices of AMS
Before Communications of ACM became cool again, I’ve been a regular reader of the Notices of American Mathematical Society. I still check it out occasionally to keep tabs on the mathematical community.
This month’s issue featured a lengthy article with a lofty title “Mathematical Methods in the Study of Historical Chronology”. It covers the work of Russian mathematician Anatoly Fomenko and his frequent co-author Gleb Nosovsky, where they re-examine traditional chronology of the history of the ancient world based on astronomical records and statistics.
Fomenko’s starting point is that much of traditional dating methods are circular (events and archaeological artifacts are dated relative to each other with very few absolute references). However, intervals between noteworthy astronomical phenomena, such as solar and lunar eclipses, are sufficiently unique within the period covered by recorded history, so that they can be used to anchor chronicles mentioning these events. Zodiacs and star atlases can also be dated independently based on the perceived movement of stars, some of which have shifted considerably over the two millennia. According to Fomenko and his co-authors, doing so systematically reveals stark inconsistencies in traditional chronology, which has to be revised if not completely rewritten.
This narrative is appealing and captivating: A lonesome hero, maverick and iconoclast, rides into town and wielding nothing but a Magnum and Matlab cuts through the fog of conventional wisdom, “Moneyball” style, chasing out the charlatans and humanitarians. I am mixing my metaphors here but as we shall see it is quite fitting Fomenko’s theories. We love to root for the underdog, especially if the little guy is one of our own: David against Goliath, Frodo against Sauron, Nate Silver against the GOP.
The Notices article describes several Fomenko’s methods but oddly enough does not offer any examples of their applications. It is indeed strange, since some of these methods are at least thirty years in the making. Surely, they must have led to something interesting? I will fill the gap by consulting the magnum opus of Fomenko et al. “History: Fiction or Science?” spread across four volumes (in English; there are several more in the original Russian).
Thucydides in his monumental history of the Peloponnesian War describes three eclipses, two solar and one lunar, 7 and 11 years apart. Traditional chronology places these events in the years 431, 424, and 413 BC, which disagree in some details with Thucydides’ account (such as whether the first eclipse was total or partial). To find a solution fully consistent with his description one has to fast forward to 1039 AD.
Rather than to admit that either the original text or its surviving copies are not fully accurate, as traditional historians do, Fomenko concludes that the Athenian general actually did live in 11 century AD, less than 1,000 years ago. After assembling a few more examples of similar inconsistencies, he declares the entire history of the ancient world a fabrication done in the Renaissance.
Was it a fraud or a giant mistake perpetrated by generations of historians, building upon each other’s mistakes? Fomenko and Nosovsky identify certain patterns to the history that never was. Namely, they assert, that most of what passes for historical accounts are copies of actual events, shifted and disguised by changing or translating proper names and places. The Notices article vaguely mentions it in the context of the Bible authorship. In reality, exercises in identifying parts of history that are duplicates of each other constitute the bulk of Fomenko’s work.
This is where Fomenko and his comrades abandon the scientific method for good and stand with both feet firmly planted in quackery. Everything goes to establish equivalence between events, locales, or historical figures: a fleeting phonetic or graphical similarity (often present only if one spells the name backwards and in Russian), parallelisms in some biographical or geographical details, or just a hunch. A couple of quotes from volume IV, which covers what is traditionally thought of as history of Britain, illustrate this approach nicely (page 614):
Galfridus proceeds to tell us that the city of New Troy, or London, has been founded on River Thames. We believe the name to have been a reference to the Bosporus initially, which is where we find Constantinople. This strait is very long and relatively narrow; it does look like a river on maps [...]Let us also take a closer look [at] the word Thames. Bearing in mind the Oriental manner of reading words from the right to the left and the word “sound”, a synonym of the word “strait”. Reversed and unvocalized, it looks as “DNS” – possibly, a version of TMS (Thames).
I mentioned that according to Fomenko, much of ancient history (actually, all the way to the XVI century) was not entirely made of the whole cloth but was obtained by shifts, translations, and superposition. Where did the originals take place, and when? It turns out, astonishingly, that most of the world’s history happened somewhere in the steppes of the Eastern Europe and the Middle East. Again, from the same text (page 626):
The fact that we managed to identify the Scots of the XII-XV century as the Scythians must also imply that the term Irish had been synonymous to the term “Russian” in the epoch in question (RSS or RSH = Russia sans vocalization); the name Ireland may also have referred to Russia once.
I stop taxing your patience here (similar free associations fill thousands of pages). Hopefully it is enough to convince you that Fomenko makes “The Bible Code” look as well argued as Bourbaki.
Fomenko has quite a following in Russia, with his books widely available and debated there (I counted 65 books co-authored by Fomenko currently in print, not including his textbooks on topology and differential equations). It is not truly surprising given that Russia is known as a country with unpredictable past (Orwell’s “He who controls the present controls the past” rings as true today as ever), and its national pride has been badly hurt by the post-Cold War world order. In such an environment, Fomenko’s version of history as a world-wide forgery that ascribes to other nations the events of Byzantine and Russian past is simultaneously believable and appealing.
Promoting these views from the pages of the Notices of AMS is a different matter. Most troubling is that this is not the first time in recent memory that we are having a similar conversation. The last time the Notices got a lot of coverage in CS blogs, it was in reaction to Neal Koblitz’s article alleging that modern cryptography is a fraud wrapped in bad mathematics.
Mathematicians are known (as are men and women of other professions) to occasionally adopt extreme or radical positions. Most outrageous examples include Theodore Kaczynski (the Unabomber) or Serge Lang who was an outspoken AIDS denialist. By themselves, mathematics credentials are not a guarantee of inerrant judgement in matters remote from one’s area of expertise. Nothing I’ve seen suggested that mathematicians are particularly prone to be taken to conspiracy theories, but neither are they immune to them.
I do find it disturbing that the Notices offered their pages to views antithetical to mathematics (Neal Koblitz lamented the use of proofs by cryptographers) and to promotion of phony scholarship. The trajectory is downhill: the article of Neal Koblitz (and a follow-up, joint with Alfred Menezes) merited a serious response, while Fomenko and Nosovsky’s writings are studied as expressions of the nation’s psyche. In departure from standards of scientific publications, controversial viewpoints are not accompanied by rebuttals, other than by much delayed and necessarily curt letters to the editor. (The only source critical of Fomenko, which is acknowledged by the Notices article, appears in the references but never cited in the text.)
The most charitable explanation I could fathom was that the article was an elaborate April Fools’ joke (even though the issue became available weeks before April 1st). If this is so, consider me punk’d. Otherwise, it is the editorial board and the readership of the Notices that got duped.