Assume you have learned that the winning numbers in the state lottery are 1, 2, 3, 4, 5, and 6. Would you suspect that the drawing is faulty? You very well may, but why? After all, the probability of this sequence is no smaller than the probability of 9, 15, 21, 40, 54 and 11 (the winning numbers in California’s draw #723 from May 25). In fact, all sequences are equally unlikely, but still every time one of these unlikely sequences manages to be drawn. Now assume that in addition the winning ticket is held by a barber. Would this raise any suspicion of foul play? Ridiculous, if barbers win do they not deserve their earnings? But if the winner has long been the barber and friend of the person overseeing the lottery, this *could* be grounds for suspicion. Is this fair? Should it be legal grounds to sustain a fraud indictment? And if once is not convincing enough, what if it happens 20 times? How could we convincingly argue that this event (which is not less likely as the event that some other sequence of 20 people win), suggests that the drawings were not truly random? Isn’t the problem with randomness that you can never be sure?

Rigged lotteries and other related scenarios are the topic of a thought provoking paper: Impugning Randomness, Convincingly by Yuri Gurevich and Grant Olney Passmore. Let us return to the lottery where the winning numbers are 1, 2, 3, 4, 5 and 6. How can we argue that a fixed sequence of numbers is not “random enough”? Some readers may guess at this point that a possible approach is to invoke Kolmogorov Complexity, which is indeed what the aforementioned paper does. Intuitively, the sequence 1, 2, 3, 4, 5 and 6 has a short description (and thus a small Kolmogorov Complexity). Since only relatively few sequences can have a short description, the probability that any short-description sequence is drawn is very small and thus it is strong evidence that the drawing was not uniform. Why aren’t we done? The main remaining challenge if we take this route is that the description length of a string depends on the description language (alternatively, on the universal Turing machine with respect to which we measure the Kolmogorov Complexity). This challenge is the main focus of the paper, and is the reason we cannot view the problem as solved.

When reading this interesting paper I was thrown back to a memory from my high school years. A long time ago, my high school grade was subjected to a lecture by a missionary Rabbi. He talked about Bible Codes: these are unexpected patterns that were found in the Bible. For example, it is claimed that by taking every 50th letter of the Book of Genesis starting with the first law, the Hebrew word “torah” (Bible) is spelled out. The lecturer, attempting to be subtle, claimed that he does not submit these codes as a proof for the existence of god, but scientists have determined that they prove that the author of the Bible had IQ 5000 (or some other meaningless number). I stood up and argued against (passionately but unfortunately not very eloquently nor too effectively). What I intuitively understood was that there are so many possible surprising sequences, that the existence of some could be a matter of luck. I was guessing that such sequences exist in any large enough text, of either heavenly or earthly origin.

Bible codes became the center of additional controversy when Witztum, Rips and Rosenberg (WRR) described in a paper the results of two experiments. Similar Bible codes matched the names of famous Rabbis that existed long after the Bible was written, and the appearances of these names was argued to be statistically significant (that is, unlikely to be explained by sheer luck). As someone who always viewed Bible codes (when taken too seriously) as the realm of missionaries and quacks, I was dismayed to see it gain credence from a mathematician of such high caliber. Fortunately, the opponents of WRR had a much more able voice than in my high-school story. I will not get too deeply into the details of this debate, but will just mention very informally that the opponents argued that the data selection was biased (specifically, the choices of the exact spelling of the Rabbi’s names among the possible spellings) and that even assuming the phenomenon WRR tried to demonstrate, the results are simply too good to be true. In particular, the confidence in the two experiments was too close. I will demonstrate this argument with a story due to Ehud Friedgut which appeared here that ties back quite nicely with the start of this post.

A man claims to be able to hit a globe hanging 200 meters away with a bow and arrow while blindfolded. An experiment is set; he shoots two arrows (a few minutes apart) and then sends his son to fetch the globe (which is too far from other observers’ sight). The son returns with a globe and two arrows stuck into it as close as physically possible. This level of accuracy makes it even harder to believe the integrity of the experiment but we cannot yet prove our suspicion. But now assume that we learn that while the arrows were shot, the globe was rapidly spinning around its axis (without the father and son’s knowledge). This means that, regardless of the father’s archery skills, the longitude of the two arrows should be distributed uniformly. Therefore, while it is still possible that the two arrows will end up adjacent, it will happen with extremely low probability and we can therefore view their position as a proof of probabilistic nature that the experiment was rigged.

There is much more to discuss on what can be learned about a distribution from one or a few samples and about the integrity of scientific exploration. I hope to revisit these topics in the future.

I’d like to comment on the goodness of the decision procedure for the arrow-and-globe example. In your example, you describe a decision procedure that is supposed to tell whether the experiment was rigged or not: if the arrows are very close together longitude-wise, reject the experiment. Otherwise, accept the experiment.

How can we evaluate whether the decision procedure is any good? Thinking as a complexity theorist/cryptographer, this decision rule is not good in the adversarial case.

It’s good in the completeness case: if this experiment were not rigged (i.e. the longitude of the arrows are uniformly distributed), then the decision rule will pass with high probability.

It’s not good in the soundness case, because the adversary (the son) can simply place the arrows on antipodal points, which is definitely not random, but this passes the decision rule.

Yet somehow we have the sense that, from this example, the adversary isn’t just motivated to cheat you in any old fashion; he is extremely vain and is motivated to prove to you that he’s godlike at shooting arrows (and hence will try to place the arrows as close together as possible). In this “goal-oriented adversary” setting, the decision rule actually makes sense, because we can rule out strategies like placing arrows at antipodal points — we know that the adversary won’t care for doing things like that.

This suggests there could be a new way of analyzing adversarial behavior in TCS, where space of cheating strategies can be much smaller than usual. This reminds me of Micali and Azar’s recent “Rational Proofs” paradigm. Is this the right perspective to use when analyzing situations such as the lottery drawing or Bible codes?

Indeed, the moral of this story seems to be that even when you cheat you should try not to overdo it (something many high-school D students already know). As you point out, this is not really a test (as it is easy to fool).

One of the archetypal stories is that of the rogue or over ambitious scientist, that even though his intentions are good, ends up creating a catastrophe (Frankenstein, Dr. Jekyl and many many more). The WRR paper shows that in real life the capacity of mathematicians to do damage is just as large.

Dear Udi, I dont see how WRR’s paper show anything of that kind.

Of course the underlying assumption is that believing in bible codes, and other such stuff, is not only wrong but is harmful. I’m aware of the irony that this is an attempt to give a ‘proof’ for this argument, so one can argue that it does not undermine the importance of rationality as a guiding principle. I don’t buy this argument though and I think in reality the credence it gave to the bible code industry and its missionaries is very damaging.

I believe we already have tools to argue that our suspicion of a foul play when the lottery president’s wife wins the lottery is well-founded. I am thinking about Bayes’ rule.

What is the prior of the lottery president’s being a crook? Currently, about 3% of the US population is either in prison, on probation, or on parole. Hopefully, lottery presidents are selected from a more law-abiding demographic. Let’s give the guy a benefit of the doubt, and say that the prior is 1/10,000, which is much smaller than, for instance, the number of congressmen convicted of crimes in the last decades (see here http://www.freerepublic.com/focus/news/1590201/posts) in proportion to the total number of people having served in the US Congress.

Now, the math is relatively straightforward. If someone from the lottery president’s immediate family (consisting, say, of four people) wins the jackpot, the chances of this happening by pure chance would have been 1 in a million. If this event did happen, the posterior probability of the president’s being the crook is more than 99%.

Connection between Kolmogorov complexity and Bayes reasoning is well known, I just think that it helps to call it out explicitly in this example.

Thanks Ilya. This is a very good point, and in your example it makes a lot of sense (as we can give some reasonable lower bounds on the probability of corruption).

Taking it to other examples: If I want to use this approach to argue that the sequence 1, 2, 3, 4, 5, 6 reflects a faulty process then I need to give evidence on how the process can produce this sequence due to a fault with much higher probability than by chance. Right? Is it always possible to do in such scenarios?

Finally, sometimes assigning a prior already asserts what we would like to argue. For example, what is the probability that the Bible predicts the future in a “non-accidental” way? Some may claim it essentially zero, but this is exactly what’s under discussion.

One of the interesting issues that this old story raised is if statistics can be used (after the fact) to raise suspicions or even to prove these suspicions about the integrity of a statistical experiment, especially in connection with the familiar problem of biased data selection. (Here we talk about statistical evidence which is convincing even if we accept the research hypothesis, although it can be valuable jut to find evidence which gives an alternative explenation to the research hypothesis.)

We considered several methods: The first was to trace naive statistical expectations behind the researchers outcomes. Too close replication is a familiar phenomenon in such cases. The reason this is effective is that it is often quite difficult from the scientist who makes the experiments to know in advance how close the replication is expected to be. The second method which is quite cute was to detect the opposite phenomenon on data that was left out. A third method was to study the experimental effect after applying variations to the measurement tools. Still another thing to examine is how the raw experimental data goes along with the precice research hypothesis.

Overall, I tend to agree with the spirit of Henry’s comment, and I think that statistical tests after the experiment is conducted have rather small power in examining the integrity of a statistical experiment.

It is an interesting question if, in the design of statistical experiments, (in cases where replication is very expensive,) we can add as part of the design of the experiment some ingredients which will make it possible to examine after the fact the integrity of the experiment itself. So the “prover” who runs the experiment will not know in advance what the “verifier” who makes the statistical “integrity tests” after the experiment will choose to do.

Two remarks regarding my old paper are that my initial purpose (which was very very naive) was to put forward in front of the more senior researcher ‘R’ some evidence that could serve as a “red light” for him about the integrity of the data selection which he had no part in. The other remark is that according to Wikipedea, the other researcher ‘W’ “applied an uncommon technique in scientific debates, when he offered to pay a million dollars” for an evidence convincing to him that an experiment based on a list prepared by experts will not succeed. 🙂

Gil Kalai just pointed out to me an extremely related link that appeared in Gower’s blog post from today. I guess there is no need to invent examples, they are simply out there up for grabs 🙂