# Speaking about unspeakables

*[Below is the transcript of a talk I gave at the Harvard Lectures that Last event (see video). This was a talk for a general (non technical) audience, but I still thought some people might find it interesting. For some more technical discussion of the notion of “computational Bayesian probabilities” I hint at toward the end, see the following talk and blog post.]*

Today I want to talk about one of humanities’ greatest inventions. This is an invention you encountered already when you were 5 or 6. No, I’m not talking about the wheel or fire or even the iPad, but about the *place-value number system*.

Because we use the place-value decimal system, we only need six digits to express number of miles from here to the moon. But if you wanted to write this distance using roman numerals, it will take more than 200 letters. The distance to the sun would fill a small book.

This means that for cultures without place-value systems, these quantities were not merely *unknown* — they were **unspeakable** . This is why the place-value system is such a great invention. It did more than simply help us do better calculations- it expanded mankind’s imagination and gave us access to concepts that we simply couldn’t comprehend otherwise.

What does an ancient Babylonian concept has to do with Computer Science? Computers, like people, need to be *taught* to do even simple things like multiply numbers.

And it matters *how* they do it. For example, according to my unscientific and non-IRB approved experiments, a 9 year old will take about an hour to multiply two 30-digit numbers using the place-value algorithm she learned in elementary school. In contrast, even the fastest supercomputer on earth will take more than a *century* to do this computation if it follows the naive roman-numeral way of repeatedly adding the numbers to one another. So you see, even with all the technological advances of the last 3000 years, we still need those old Babylonian insights.

In fact, Computer Science has about as much to do with computers as English has to do with printing technology. The point of computer science is not about mechanizing some calculation but about finding a novel way to express a concept we couldn’t grasp before. Finding such a way enriches us whether or not it ends up being programmed into a computer, since it lets us access what was heretofore *unspeakable*.

Some of the phenomena computer scientists investigate arise from modern challenges; whether it is finding a new language to express social interactions among millions of people or ways to speak about biochemical interactions among thousands of genes. But we also study age-old riddles that have been around since the dawn of civilization. For example, it turns out that the basic algorithm we all learn in elementary school is *not* the best way to multiply two numbers. Computer scientists found new and much better ways in the 20th century and the currently best known algorithm was only discovered in 2007.

Computer science has had amazing advances that expanded our horizons both materially and intellectually. But I am personally even more fascinated by the discovery that some notions are in fact **inherently unspeakable**. That is, no matter what technological advances or new inventions humanity comes up with, some tasks will be forever out of our reach. These limitations tell us about the fundamental laws of computation, in the same way that the physical laws of nature prohibit a perpetual motion machine or faster-than-light communication. Much of computer science is about dealing with these inherent limitations— how can we talk about what we can not talk about?

Let me give an example loosely related to my own work. Can we really grasp probabilities? We talk about them all the time. You hear in the news that with probability 50\% the global sea level will rise by half a meter before 2100, or that Hillary Clinton has a 54% chance of becoming president in 2016. These are important events that have serious consequences. But what do these numbers actually mean?

Economists have long thought they can answer this question. These probabilities are supposed to reflect the predictions of a *perfectly rational person* that has accumulated all available evidence. This is the same hypothetical rational person that makes free markets work. The only problem is that this person doesn’t exist. And this is not just because we humans are biased, lazy, or stupid. Rather, it turns out that these probabilities are often **inherently unspeakable**. For example, to truly compute the probability that Hillary Clinton becomes president, you would need to know the probability that she wins Florida, the probability that she wins Ohio, as well as the probability that she wins both Florida and Ohio, and so on and so forth. Unfortunately, there are literally *quadrillion* such combinations of the 50 states. If you considered all combinations of the 3000+ counties you get a number so large that it doesn’t even have a name. There is no way for any human *or* computer to compute all these probabilities. But yet this is what needed to obtain such perfectly rational predictions.

So, how *do* we talk about what we can not talk about? This is a topic for a longer lecture, but one approach involves borrowing insights from Voltaire, Ralph Waldo Emerson as well as quantum mechanics. To make these probabilities “speakable” we need to realize that the “perfect is the enemy of the good” and give up on the “hobgoblin of consistency”. In fact we sometimes even need to consider probabilities that are *negative* numbers. If this sounds mysterious, I hope it entices you to learn some more about computer science.

Boaz, I’m confused. Are you simply saying that real numbers are unspeakable because there are too many? You argument about a probability being unspeakable because the obvious way of computing it would take a really long time seems to have a lot in common with most “proofs” that P != NP.

For this audience I thought that it’s fine to treat “P!=NP” as a law of nature, and so assume it has already been proven (and I think it has, to the same extent which it was proven that there is no faster than light communication).

Also, there is an issue that beyond *computation* the naive *representation* for a Bayesian model on an event with n underlying variables takes 2^n numbers. But of course all the examples above are that sometimes in CS we find non naive representations (and indeed in the examples above I deliberately didn’t distinguish between computation and representation, or algorithms and data structures).

Very nice lecture! Boaz, do you regard the disability to compute the probability that Clinton will win the 2016 election (or, to compute now the probability, say, that a republican will win the 2056 US election) mainly an issue of computational complexity or mainly an issue of (lack of) information?

Our inability to *perfectly predict* this event is a matter of lack of information.

Our inability to *assign a probability* to this event that is consistent with the probabilities we assign to correlated events is a matter of computation. The probability is after all defined based on whatever information is available to us.

On Tue, Feb 16, 2016, 10:39 PM Windows On Theory wrote:

>

We disagree here. Of course, the informational and computational issues are intertwined, but I think that the lack of perfect information and the sensitivity of the outcomes to noise is the crucial obstacle to *assign a probability*. Namely, you will not be able to assign a probability *even* with an unlimited computational power.

I don’t think this is an issue of information and computation but something else.

A pure Bayesian would say that you can always assign a *probability*. Indeed, you can assign a probability to the chance that the Massachussets lottery would be a particular set of numbers in the first week of 2056, even though you don’t have information about this. The issue here is that “you know what you don’t know” or that there no “unkown unkowns”. I don’t think the question of whether you can assign probabilities is about information vs. computation but it’s more a philosophical question of whether you treat all uncertainties as coming from some prior or you declare that for some of them no prior can be meaningful.

I don’t know if I have a coherent philosophy of probability, but would tend to agree with you that for some events, whether it is the probability that a republican wins in 2056 or that some super artificial intelligence destroys the world, we cannot really assign meaningful probabilities.

But there are many other events, including the 2016 election or whether a particular stock will go up or down, where people can and do assign probabilities to them. Whether those probabilities can be *consistent* is often a matter of computation. In practice people often model events as independent even though they know full well they are dependent simply because keeping track of the dependencies becomes too hairy in terms of computation and memory cost.