Today I want to talk about one of humanities’ greatest inventions. This is an invention you encountered already when you were 5 or 6. No, I’m not talking about the wheel or fire or even the iPad, but about the placevalue number system.
Because we use the placevalue decimal system, we only need six digits to express number of miles from here to the moon. But if you wanted to write this distance using roman numerals, it will take more than 200 letters. The distance to the sun would fill a small book.
This means that for cultures without placevalue systems, these quantities were not merely unknown — they were unspeakable . This is why the placevalue system is such a great invention. It did more than simply help us do better calculations it expanded mankind’s imagination and gave us access to concepts that we simply couldn’t comprehend otherwise.
What does an ancient Babylonian concept has to do with Computer Science? Computers, like people, need to be taught to do even simple things like multiply numbers.
And it matters how they do it. For example, according to my unscientific and nonIRB approved experiments, a 9 year old will take about an hour to multiply two 30digit numbers using the placevalue algorithm she learned in elementary school. In contrast, even the fastest supercomputer on earth will take more than a century to do this computation if it follows the naive romannumeral way of repeatedly adding the numbers to one another. So you see, even with all the technological advances of the last 3000 years, we still need those old Babylonian insights.
In fact, Computer Science has about as much to do with computers as English has to do with printing technology. The point of computer science is not about mechanizing some calculation but about finding a novel way to express a concept we couldn’t grasp before. Finding such a way enriches us whether or not it ends up being programmed into a computer, since it lets us access what was heretofore unspeakable.
Some of the phenomena computer scientists investigate arise from modern challenges; whether it is finding a new language to express social interactions among millions of people or ways to speak about biochemical interactions among thousands of genes. But we also study ageold riddles that have been around since the dawn of civilization. For example, it turns out that the basic algorithm we all learn in elementary school is not the best way to multiply two numbers. Computer scientists found new and much better ways in the 20th century and the currently best known algorithm was only discovered in 2007.
Computer science has had amazing advances that expanded our horizons both materially and intellectually. But I am personally even more fascinated by the discovery that some notions are in fact inherently unspeakable. That is, no matter what technological advances or new inventions humanity comes up with, some tasks will be forever out of our reach. These limitations tell us about the fundamental laws of computation, in the same way that the physical laws of nature prohibit a perpetual motion machine or fasterthanlight communication. Much of computer science is about dealing with these inherent limitations— how can we talk about what we can not talk about?
Let me give an example loosely related to my own work. Can we really grasp probabilities? We talk about them all the time. You hear in the news that with probability 50\% the global sea level will rise by half a meter before 2100, or that Hillary Clinton has a 54% chance of becoming president in 2016. These are important events that have serious consequences. But what do these numbers actually mean?
Economists have long thought they can answer this question. These probabilities are supposed to reflect the predictions of a perfectly rational person that has accumulated all available evidence. This is the same hypothetical rational person that makes free markets work. The only problem is that this person doesn’t exist. And this is not just because we humans are biased, lazy, or stupid. Rather, it turns out that these probabilities are often inherently unspeakable. For example, to truly compute the probability that Hillary Clinton becomes president, you would need to know the probability that she wins Florida, the probability that she wins Ohio, as well as the probability that she wins both Florida and Ohio, and so on and so forth. Unfortunately, there are literally quadrillion such combinations of the 50 states. If you considered all combinations of the 3000+ counties you get a number so large that it doesn’t even have a name. There is no way for any human or computer to compute all these probabilities. But yet this is what needed to obtain such perfectly rational predictions.
So, how do we talk about what we can not talk about? This is a topic for a longer lecture, but one approach involves borrowing insights from Voltaire, Ralph Waldo Emerson as well as quantum mechanics. To make these probabilities “speakable” we need to realize that the “perfect is the enemy of the good” and give up on the “hobgoblin of consistency”. In fact we sometimes even need to consider probabilities that are negative numbers. If this sounds mysterious, I hope it entices you to learn some more about computer science.
The slow but steady progress towards constructing quantum computers has caused the NSA to announce that users should transition away from RSA, DiffieHellman and Elliptic Curves (that can be broken by Shor’s algorithm) in the “not too distant future”. That, coupled with the fact that most of the exciting recent theoretical advances (such as fully homomorphic encryption and indistinguishability obfuscators) have been used latticebased cryptography, means that I will shift focus away from the number theoretic constructions and toward latticebased crypto.
Also, while the security of the assumptions underlying the recent obfuscation constructions remains murky, and they are still too complicated to be covered with the full gory details, I am going to talk about obfuscation, and in particular use it as a pedagogical tool. One thing that has always bothered me about teaching crypto is that we teach all these wonderfully amazing notions such as public key crypto, zero knowledge proofs, secure multiparty computation, fully homomorphic encryption, etc.. as if they “fell from the sky”. Each one of them comes as a huge surprise that it could exist, and the students don’t get a sense of why people were hopeful enough to look for a construction in the first place. Obfuscation gives, at least at some intuitive level, an explanation as to why someone who is not on drugs might be bold enough to imagine that such concepts could exist, and I hope to convey to the students this intuition.
Also, I strongly believe that despite the need to cover the fundamentals, an undergraduate cryptography course should not consists mostly of a “dry” collection of basic theorems, definitions, and reductions and I do want to talk about some more exciting notions such as bitcoin, quantum computing, fully homomorphic encryption, and more. This of course requires some tradeoffs, and my typical way out of this is that I simply expect the students to work harder…
If you are not around Harvard and am interested in the course, you can always take it as part of Harvard’s extension school, or you can simply follow the lecture notes that I will post on the course web page.
Below are my draft notes for Tuesday’s lecture I won’t post here future lecture notes, so follow the website for more. I am not going to actually cover all of this in class I am doing a “quasi flipped” class where I expect the students to read the lecture notes before each class, so I can talk about high level intuitions and answer questions, rather than going through all details of the proof.
CS 127: Cryptography / Boaz Barak
Optional additional reading: Chapters 1 and 2 of KatzLindell book.^{1}
Ever since people started to communicate, there were some messages that they wanted kept secret. Thus cryptography has an old though arguably undistinguished history. For a long time cryptography shared similar features with Alchemy as a domain in which many otherwise smart people would be drawn into making fatal mistakes. d The definitive text on the history of cryptography is David Kahn’s “The Codebreakers”, whose title already hints at the ultimate fate of most cryptosystems.^{2} (See also “The Code Book” by Simon Singh.) We now recount just a few stories to get a feel for this field. But, before we do so, we should introduce the cast of characters. The basic setting of “encryption” or “secret writing” is the following: one person, whom we will call Alice, wishes to send another person, whom we will call Bob, a secret message. Since Alice and Bob are not in the same room (perhaps because Alice is imprisoned in a castle by her cousin the queen of England), they cannot communicate directly and need to send their message in writing. Alas, there is a third person, whom we will call Eve, that can see their message. Therefore Alice needs to find a way to encode or encrypt the message so that only Bob (and not Eve) will be able to understand it.
In 1587, Mary the queen of Scots, and the heir to the throne of England, wanted to arrange the assasination of her cousin, queen Elisabeth I of England, so that she could ascend to the throne and finally escape the house arrest under which she has been for the last 18 years. As part of this complicated plot, she sent a coded letter to Sir Anthony Babington. It is what’s known as a substitution cipher where each letter is transformed into a different symbol, and so the resulting letter looks something like the following:
At a first look, such a letter might seem rather inscrutable a meaningless sequence of strange symbols. However, after some thought, one might recognize that these symbols repeat several times and moreover that different symbols repeat with different frequencies. Now it doesn’t take a large leap of faith to assume that perhaps each symbol corresponds to a different letter and the more frequent symbols correspond to letters that occur in the alphabet with higher frequency. From this observation, there is a short gap to completely breaking the cipher, which was in fact done by queen Elisabeth’s spies who used the decoded letters to learn of all the coconspirators and to convict queen Mary of treason, a crime for which she was executed.
Trusting in superficial security measures (such as using “inscrutable” symbols) is a trap that users of cryptography have been falling into again and again over the years. As in many things, this is the subject of a great XKCD cartoon:
The Vigenère cipher is named after Blaise de Vigenère who described it in a book in 1586 (though it was invented earlier by Bellaso). The idea is to use a collection of subsitution cyphers – if there are n different ciphers then the first letter of the plaintext is encoded with the first cipher, the second with the second cipher, the nth with the n–th cipher, and then the n+1 st letter is again encoded with the first cipher. The key is usually a word or a phrase of n letters, and the ith substition cipher is obtained by shifting each letter positions in the alphabet. This “flattens” the frequencies and makes it much harder to do frequency analysis, which is why this cipher was considered “unbreakable” for 300+ years and got the nickname “le chiffre indéchiffrable” (“the unbreakable cipher”). Charles Babbage cracked the Vigenère cipher in 1854 but did not publish it. In 1863 Friedrich Kasiski broke the cipher and published the result. The idea is that once you guess the length of the cipher, you can reduce the task to breaking a simple substitution cipher which can be done via frequency analysis (can you see why?). Confederate generals used Vigenère regularly during the civil war, and their messages were routinely cryptanalzed by Union officers.
The story of the Enigma cipher had been told many times, and you can get some information on it from Kahn’s book as well as Andrew Hodges’ biography of Alan Turing. This was a mechanical cipher (looking like a typewriter) where each letter typed would get mapped into a different letter depending on the (rather complicated) key and current state of the machine which had several rotors that rotated at different paces. An identically wired machine at the other end could be used to decrypt. Just as many ciphers in history, this has also been believed by the Germans to be “impossible to break” and even quite late in the war they refused to believe it was broken despite mounting evidence to that effect. (In fact, some German generals refused to believe it was broken even after the war.) Breaking Enigma was an heroic effort which was initiated by the Poles and then completed by the British at Bletchley Park; as part of this effort they built arguably the world’s first large scale mechanical computation devices (though they looked more similar to washing machines than to iPhones). They were also helped along the way by some quirks and errors of the german operators. For example, the fact that their messages ended with “Heil Hitler” turned out to be quite useful. Here is one entertaining anecdote: the Enigma machine would never map a letter to itself. In March 1941, Mavis Batey, a cryptanalyst at Bletchley Park received a very long message that she tried to decrypt. She then noticed a curious property— the message did not contain the letter “L”.^{3} She realized that it must be the case that the operator, perhaps to test the machine, have simply sent out a message where he repeatedly pressed the letter “L”. This observation helped her decode the next message, which helped inform of a planned Italian attack and secure a resounding British victory in what became known as “the Battle of Cape Matapan”. Mavis also helped break another Enigma machine which helped in the effort of feeding the Germans with the false information that the main allied invasion would take place in Pas de Calais rather than on Normandy. See this inteview with Sir Harry Hinsley for more on the effect of breaking the Enigma on the war. General Eisenhower said that the intelligence from Bletchley park was of “priceless value” and made a “very decisive contribution to the Allied war effort”.
We now turn to actually defining what is an encryption scheme. Clearly we can encode every message as a string of bits, i.e., an element of for some . Similarly, we can encode the key as a string of bits as well, i.e., an element of for some n. Thus, we can think of an encryption scheme as composed of two functions. The encryption function E maps a secret key and a message (known also as plaintext) into a ciphertext for some o. We write this as $latex c=E_k(m)$. The decryption function D does the reverse operation, mapping the secret key k and the cyphertext c back into the plaintext message m, which we write as The basic equation is that if we use the same key for encryption and decryption, then we should get the same message back. That is, for every k and m ,
A note on notation: We will always use i,j,ℓ,n,o to denote natural numbers. n will often denote the length of our secret key, and ℓ the length of the message, sometimes also known as “block length” since longer messages are simply chopped into “blocks” of length ℓ and also appropriately padded. We will use k to denote the secret key, m to denote the secret plaintext message, and c to denote the encrypted ciphertext. Note that c,m and k are bit strings of lengths o,ℓ and n respectively. The length of the secret key is often known as the “security parameter” and in other texts it is often denoted by k or κ. We use n to correspond with the standard algorithmic notation for input length (as in O(n) time algorithms).
Note that this definition so far says nothing about security and does not rule out trivial “encryption” schemes such as the scheme that simply outputs the plaintext as is. Defining security is tricky, and we’ll take it one step at a time, but lets start by pondering what is secret and what is not. A priori we are thinking of an attacker Eve that simply sees the ciphertext C and does not know anything on how it was generated. So, it does not know the details of E and D, and certainly does not know the secret key k. However, many of the troubles past cryptosystems went through was caused by them relying on “security through obscurity”— trusting that the fact their methods are not known to their enemy will protect them from being broken. This is a faulty assumption – if you reuse a method again and again (even with a different key each time) then eventually your adversaries will figure out what you are doing. And if Alice and Bob meet frequently in a secure location to decide on a new method, they might as well take the opportunity to exchange their secrets.. These considerations led Kerchoffs to state the following principle:
A cryptosystem should be secure even if everything about the system, except the key, is public knowledge. (Auguste Kerckhoffs, 1883)
(The actual quote is “Il faut qu’il n’exige pas le secret, et qu’il puisse sans inconvénient tomber entre les mains de l’ennemi” loosely translated as “The system must not require secrecy and can be stolen by the enemy without causing trouble”. According to Steve Bellovin the NSA version is “assume that the first copy of any device we make is shipped to the Kremlin”.)
Why is it OK to assume the key is secret and not the algorithm? Because we can always choose a fresh key. But of course if we choose our key to be “1234” or “passw0rd!” then that is not exactly secure. In fact, if you use any deterministic algorithm to choose the key then eventually your adversary will figure out. Therefore for security we must choose the key at random. Thus following can be thought of as a restatement of Kerchkoffs’s principle:
There is no secrecy without randomness
This is such a crucial point that is worth repeating:
There is no secrecy without randomness
At the heart of every cryptographic scheme there is a secret key, and the secret key is always chosen at random. A corollary of that is that to understand cryptography, you need to know some probability theory. Fortunately, we don’t need much of probability only probability over finite spaces, and basic notions such as expectation, variance, concentration and the union bound suffice for most of we need. In fact, understanding the following two statements will already get you much of what you need for cryptography:
The handout on mathematical background contains some of the probability and discrete mathematics that we’ll need, and this will also be reviewed in the sections.
How do we actually get random bits in actual systems? The main idea is to use a two stage approach. First we need to get some data that is unpredictable from the point of view of an attacker on our system. Some sources for this could be measuring latency on the network or hard drives (getting harder with solid state disk), user keyboard and mouse movement patterns (problematic when you need fresh randomness at boot time ), clock drift and more, there are some other sources including audio, video, and network. All of these can be problematic, especially for servers or virtual machines, and so hardware based random number generators based on phenomena such as thermal noise or nuclear decay are becoming more popular. Once we have some data X that is unpredictable, we need to estimate the entropy in it. You can roughly imagine that X has k bits of entropy if the probability that an attacker can guess X is at most . People then use a hash function (an object we’ll talk about more later) to map X into a string of length k which is then hopefully distributed (close to) uniformly at random. All of this process, and especially understanding the amount of information an attacker may have on the entropy sources, is a bit of a dark art and indeed a number of attacks on cryptographic systems were actually enabled by weak generation of randomness. Here are a few examples.
One of the first attacks was on the SSL implementation of Netscape (the browser at the time). Netscape use the following “unpredicatable” information— the time of day and a process ID both of which turned out to be quite predictable (who knew attackers have clocks too?). Netscape tried to protect its security through “security through obscurity” by not releasing the source code for htier pseudorandom generator, but it was reverse engineered by Ian Goldberg and David Wagner (Ph.D students at the time) who demonstrated this attack.
In 2006 a programmer removed a line of code from the procedure to generate entropy in OpenSSL package distributed by Debian since it caused a warning in some automatic verification code. As a result for two years (until this was discovered) all the randomness generated by this procedure used only the process ID as an “unpredictable” source. This means that all communication done by users in that period is fairly easily breakable (and in particular, if some entities recorded that communication they could break it also retroactively). This caused a huge headache and a worldwide regeneration of keys, though it is believed that many of the weak keys are still used. See XKCD’s take on that incidence.
In 2012 two separate teams of researchers scanned a large number of RSA keys on the web and found out that about 4% of them are easy to break. The main issue were devices such as routers, internetconnected printers and such. These devices sometimes run variants of Linux– a desktop operating system– but without a harddrive, mouse or keyboard, they don’t have access to many of the entropy sources that desktop have. Coupled with some good old fashioned ignorance of cryptography and software bugs, this led to many keys that are downright trivial to break, see this blog post and this web page for more details.
After the entropy is collected and then “purified” or “extracted” to a uniformly random string that is, say, a few hundred bits long, we often need to “expand” it into a longer string that is also uniform (or at least looks like that for all practical purposes). We will discuss how to go about that in the next lecture. This step has its weaknesses too and in particular the Snowden documents, combined with observations of Shumow and Frguson, strongly suggest that the NSA has deliberately inserted a trapdoor in one of the pseudorandom generators published by the National Institute of Standards and Technologies (NIST). Fortunately, this generator wasn’t widely adapted but apparently the NSA did pay $10M to RSA security so the latter would make this generator their default option in their products.
Defining the secrecy requirement for an encryption is not simple. Over the course of history, many smart people got it wrong and convinced themselves that ciphers were impossible to break. The first person to truly ask the question in a rigorous way was Claude Shannon in 1949. Simply by asking this question, he made an enormous contribution to the science of cryptography and practical security. We now will try to examine how one might answer it. Let me warn you ahead of time that we are going to insist on amathematically precise definition of security. That means that the definition must capture security in all cases, and the existence of a single counterexample, no matter how “silly”, would make us rule out a candidate definition. This exercise of coming up with “silly” counterexamples might seem, well, silly. But in fact it is this method that has led Shannon to formulate his theory of secrecy, which (after much followup work) eventually revolutionized cryptography, and brought this science to a new age where Poe’s maxim no longer holds, and we are able to design ciphers which human (or even nonhuman) ingeniuity cannot break.
The most natural way to attack an encryption is for Eve to guess all possible keys. In many encryption schemes this number is enormous and this attack is completely infeasible. For example, the theoretical number of possibilities in the Enigma cipher was about which roughly means that even if we built a filled the milky way galaxy with computers operating at light speed, the sun would still die out before it finished examining all the possibilities.^{4} One can understand why the Germans thought it was impossible to break. (Note that despite the number of possibilities being so enormous, such a key can still be easily specified and shared between Alice and Bob by writing down 113 digits on a piece of paper.) Ray Miller from the NSA had calculated that, in the way the Germans used the machine, the number of possibilities was “only” which still would mean that it would take about a year to exhaust using the fastest supercomputer of 2015, at a time digital computers were not yet invented. Clearly, it is sometimes possible to break an encryption without trying all possibilities, and so having a huge number of key combinations does not guarantee security, as an attacker might find a shortcut (as the allies did) and recover the key without trying all options.
But perhaps we can simply define security as requiring the key to be unrecoverable except with tiny probability, no matter what method? Here is an attempt at such a definition:
Security Definition (First Attempt): An encyption scheme is secure if no matter what method Eve employs, the probability that she can recover the true key from the ciphertext is at most .
You might wonder if this definition is not too strong to make sense, after all how are we going ever to prove that Eve cannot recover the secret
key no matter what she does? Edgar Allan Poe would say that there can always be a method that we overlooked. However, in fact this definition is too
weak! Consider the following encryption: the secret key is chosen at random in but our encryption scheme simply ignores it and lets and . This is a valid encryption, but of course completely insecure as we are simply outputing the plaintext in the clear. Yet, no matter what Eve does, if she only sees and not , there is no way she can guess the true value of with probability better
than , since it was chosen completely at random and she gets no information about it. Formally, one can prove the following result:
Theorem: Let be the encryption scheme above. For every function and for every , the probability that is exactly .
Proof: This follows beacuse and hence which is some fixed value that is independent of . Hence the probability that is . QED
The math behind the above argument is very simple, yet I urge you to read and reread the last two paragraphs until you are sure that you completely understand why this encryption is in fact secure according to the above definition. This is a “toy example” of the kind of reasoning that we will be employing constantly throughout this course, and you want to make sure that you follow it.
So, the above “Theorem” is true, but one might question its meaning. Clearly this silly example was not what we meant when stating this definition. However, as mentioned above, we are not willing to ignore even silly examples and must amend the definition to rule them out. One obvious objection is that we don’t care about hiding the key it is the message that we are trying to keep secret. This suggests the next attempt:
Security Definition (Second Attempt): An encyption scheme (E,D)is nsecure if for every message m no matter what method Eve employs, the probability that she can recover m from the ciphertext is at most .
Now this seems like it captures our intended meaning. But remeber that we are being anal, and truly insist that the definition holds as stated, namely that for every plaintext message and every function , the probability over the choice of that is at most . But now we see that this is clearly impossible. After all, this is supposed to work for every message and every function , but clearly if is that allzeroes message and is the function that ignores its input and simply outputs , then it will hold that with probability one.
So, if before the definition was too weak, the new definition is too strong and is impossible to achieve. The problem is that of course we could guess a fixed message with probability one, so perhaps we could try to consider a definition with a random message. That is:
Security Definition (Third Attempt): An encyption scheme (E,D)(E,D) is nsecure if no matter what method Eve employs, if m is chosen at random from , the probability that she can recover mm from the ciphertext is at most .
This weakened definition can in fact be achieved, but we have again weakened it too much. Consider an encryption that hides the last ℓ/2 bits of the message, but completely reveals the first ℓ/2 bits. The probability of guressing a random message is (which is n secure if but this is still a scheme that would be completely insecure in practice. The point being that in practice we don’t encrypt random messages— our messages might be in English, might have common headers, and might have even more structures based on the context. In fact, it may be that the message is either “Yes” or “No” (or perhaps either “Attack today” or “Attack tomorrow”) but we want to make sure Eve doesn’t learn which one it is.
So far all of our attempts at definitions oscillated between being too strong (and hence impossible) or too weak (and hence not guaranteeing actual security). The key insight of Shannon was that in a secure encryption scheme the ciphtertext should not reveal any additional information about the plaintext. So, if for example it was a priori possible for Eve to guess the plaintext with some probability 1/t (e.g., because there were onlyt possiblities for it) then she should not be able to guess it with higher probability after seeing the ciphertext. This is formalized as follows:
Security Definition (Perfect Secrecy): An encryption scheme (E,D) is perfectly secret if there for every set of plaintexts, and for every strategy used by Eve, if we choose at random m∈M and a random key then the probability that Eve guesses m after seeing is at most 1/M.
In particular, if we encrypt either “Yes” or “No” with probability 1/21/2, then Eve won’t be able to guess which one it is with probability better than half. In fact, that turns out to be the heart of the matter:
Two to Many Theorem: An encryption scheme (E,D) is perfectly secret if and only if for every two distinct plaintexts and every strategy used by Eve, if we choose at random b∈{0,1} and a random key , then the probability that Eve guesses after seeing is at most 1/2.
Proof: The “only if” direction is obvious— this condition is a special case of the perfect secrecy condition for a set M of size 2.
The “if” direction is trickier. We need to show that if there is some set M (of size possibly much larger than 2) and some strategy for Eve to guess (based on the ciphertext) a plaintext chosen from M with probability larger than 1/M, then there is also some setM′ of size two and a strategy Eve′ for Eve to guess a plaintext chosen from M′ with probability larger than 1/2.
Let’s fix the message for example to be the all zeroes message. Since $latex Eve(E_k(m_0))$ is a fixed string, if we pick a random from M then it holds that while under our assumption, on average over $m_1$, .
Thus in particular, due to linearity of expectation, there exists some satisfying
But this can be turned into an attacker Eve′ such that the probability that $latex Eve'(E_k(m_b))=m_b$ is larger than 1/2. Indeed, we can define Eve′(c) to output if and otherwise output a random message in . The probability that Eve′(c) equals is higher when than when and since Eve′ outputs either or , this means that the probability that $latex Eve′(E_k(m_b))=m_b$ is larger than 1/2 (Can you see why?) QED.
Another equivalent condition for perfect secrecy is the following: (E,D) is perfectly secret if for every plaintexts the two random variables and (for randomly and uniformly chosen keys k and k′) have precisely the same distribution.
So, perfect secrecy is a natural condition, and does not seem to be too weak for applications, but can it actually be achieved? After all, the condition that two different plaintexts are mapped to the same distribution seems somewhat at odds with the condition that Bob would succeed in decrypting the ciphertexts and find out if the plaintext was in fact mm or m′m′. It turns out the answer is yes! For example, the table below details a perfectly secret encryption for two bits.
Plain:  00  01  10  11  
Cipher:  
00 





01 





10 





11 




In fact, this can be generalized to any number of bits:
Theorem (One time pad, Vernam 1917): For every nn, there is a perfectly secret encryption (E,D) with plaintexts of n bits, where the key size and the ciphertext size is also n.
Proof: The encryption scheme is actually very simple – to encrypt a message with key , we output where is the exclusive or (XOR) operation. That is, is a vector in such that . Decryption works identically – . It is not hard to use the associativity of addition (and in particular XOR) to verify that
$D_k(E_k(m)) = (m \oplus k) \oplus k = m \oplus (k \oplus k) = m$ where the last equality follows from (can you see why?). Now we claim that for every message , the distribution for a random is the uniform distribution on . By the exercise above, this implies that the scheme is perfectly secret, since for every two messages the distributions and will both be equal to the uniform distribution. To prove the claim we need to show that for every , where this probability is taken over the choice of a random . Now note that if and only if or, equivalently, . Since is chosen uniformly at random in , the probability that it equals is exactly QED.
Note: Importance of using the one time pad only once:
The “one time pad” is a name analogous to the “point away from yourself gun” the name suggests the fatal mistake people often end up doing. Perhaps the most dramatic example of the dangers of “key reuse” is the Venona Project. The Soviets have used the onetime pad for their confidential communication since before the 1940’s, and in fact even before Shannon apparently the U.S. intelligence already knew that it is in principle “unbreakable” in 1941 (see page 32 in the Venona document )). However, it turned out that the hassles of manufacturing so many keys for all the communication took its toll on the Soviets and they ended up reusing the same keys for more than one message, though they tried to use them for completely different receivers in the (false) hope that this wouldn’t be detected. The Venona project of the U.S. Army was founded in February 1943 by Gene Grabeel a former highschool teacher from Madison Heights, Virgnia and Lt. Leonard Zukbo. In October 1943, they had their breakthrough when it was discovered that the Russians are reusing their keys (credit to this discovery is shared by Lt. Richard Hallock, Carrie Berry, Frank Lewis, and Lt. Karl Elmquist, see page 27 in the document). In the 37 years of its existence, the project has resulted in a treasure chest of intelligence, exposing hundreds of KGB agents and Russian spies in the U.S. and other countries, including Julius Rosenberg, Harry Gold, Klaus Fuchs, Alger Hiss, Harry Dexter White and many others.
The one time pad requires a key the size of the message, which means that if you plan to communicate with x people, you are going to have to maintain (securely!) x huge files that are each as long as the length of the maximum total communication you expect with that person. Imagine that every time you opened an account with Amazon, Google, or any other service, they would need to send you in the mail a DVD full of random numbers, and every time you suspected a virus, you’ll need to ask all these services for a fresh DVD. This doesn’t sounds so appealing. Ideally, one could think that Alice and Bob only share a key that is long enough to be unguessable, e.g., 128 bits, and use that for all their communication. Unfortunately this is impossible to achieve with perfect secrecy:
Theorem: If E is a perfectly secret system with key of length n and messages of length ℓ then ℓ≤n.
Proof: Suppose, towards the sake of contradiction that there was a perfectly secret system with a key of length $n$ but messages of length . Then consider the following adversary strategy for Eve: given a ciphertext , guess a random key and output . The probability that Eve is successful is at least , since with this probability she guesses the key correctly. But by perfect secrecy, if the message is chosen at random, she should have been successful with probability at most . QED
This proof might not be fully convincing – after all, an attack that succeeds with probability is not very worrying. But this violation of the security definition can be significantly boosted:
Theorem: If is an encryption with key of length and messages of length then there exist two messages and a strategy for Eve so that given an encryption for random and , Eve can output with probability at least .
Proof: Suppose that we choose two messages at random, encrypt to obtain a ciphertext and ask what is the probability that there exists
some key such that . Now, let’s fix the choice of and so consider the set .
The size of this set is at most . Now for every choice of the key , the map is one to one and so the image of this map is some set of size (i.e., there are exactly ciphertexts that are the encryption under of some ). If we pick at random then is chosen at random from the set and hence the probability that falls into is at most . Hence in particular, there must be some choice of such that Eve decides given to output if and output otherwise, then she will be successful with probability at least . QED
Note: The above proof is short but subtle. I suggest you try to read it very carefuly and make sure you understand it, since it is a prototype for future probabilistic arguments that we will be making regularly. It might help for you to consider a “baby case” when there are, say, 10 possible messages and 4 possible keys, and try to prove in this case that you can always find a pair of messages such that you can tell with probability at least 60% whether an encryption was of or .
Advanced comment: Adding probability into the picture
There is a sense in which both our secrecy and our impossiblity results might not be fully convincing, and that is that we did not explicitly consider algorithms that use randomness . For example, maybe Eve can break a perfectly secret encryption if she is not modeled as a deterministic function but rather a probabilistic process. Similarly, maybe the encryption and decryption functions as well could be probabilistic processes as well. It turns out that none of those matter. For the former, note that a probabilistic process can be thought of as a distribution over functions, in the sense that we have a collection of functions mapping to and some probabilities (nonnegative numbers summing to 1), so we now think of Eve as selecting the function with probability . But if none of those functions can give advantage better than 1/2, then neither can this collection. A similar (though more involved) argument shows that the impossiblity result showing that the key must be at least as long as the message still holds even if the encryption and decryption algorithms are allowed to be probabilistic processes as well (working this out is a great exercise).
I enjoyed it and learned quite a bit, even if I was rather strict towards the poor presenters with the timer… Here is a summary of some of the talks based on my faulty memory and the slides.. I apologize in advance for the (probably numerous) errors and omissions, and encourage you to take a look at the web pages of the presenters for much better information.
Ron Rothblum talked about new results on proving that computation is correct, including some wonderful connections between cryptography and quantum information, as well as an exciting new result with his brother Guy and Omer Reingold showing that a (at the moment quantatively suboptimal) variant of the classical IP=PSPACE result with an efficient prover.
Jayadev Acharya talked about computational versions of classical questions in statistics such as hypothesis testing, and testing distributions for monotonicity and independence, in particular getting the first linear time algorithms for several classical tasks.
Mohsen Ghaffari talked about his work on distributed algorithms. He’s had a great number of impressive results but focused on his recent SODA 16 paper on distributed maximal independent set which is the first that matches the 2006 lower bound of Kuhn et al at some range of parameters.
Nima Haghpanah talked about his work on incentives and computation and in particular about a new result with Hartline showing that despite some daunting hardness results, in several natural settings, it is possible to efficiently maximize revenue even when users have multidimensional preferences (e.g. think of users of an Internet Service Provider where some care more about the price and others care more about the bandwidth).
Z. Steven Wu Gave an excellent short talk on differential privacy, a subject on which he had a number of recent results, both giving new differentially private mechanisms as well as using it for other applications in mechanism design and learning.
Clément Canonne had probably the most interesting set of slides I have seen. He gave a high level talk about his work in learning. property and distributional testing.
Aviad Rubinstein is a non graduating student that has done several interesting works on the complexity of problems arising in computational economics, but talked about his recent thoughtproving work on the fundamental “Sparse PCA” problem, talking about the relation between approximation ratios and success on actual data and (apparently Berkeley inspired) “best case analysis”.
Laura Florescu couldn’t make it to the event but her slides describe some interesting results on the stochastic block model that has attracted much attention from people in statistics, learning, statistical physics, and more, as well as Rotor walks, that are a deterministic variant of some natural physical processes.
Bo Waggoner opened his talk with a joke that probably resonated with many of the students: when a graduating student asked what he’s working he says “last year I worked on analysis of Boolean functions and lower bounds, but this year I’m graduating, so my area is now data science”. He generally worked in algorithmic game theory and has recently studied markets for information as opposed to physical goods.
Ilya Volkovich spoke about the notorious polynomial identity testing (PIT) problem. Most of us run away in fear from it, but he actually managed to solve it for some restricted circuit classes, as well as show connections between it and other open problems.
Elad Haramaty talked about algebraic property testing. Here the canonical example is the low degree test (i.e., property testing of the ReedMuller code) that was the technical heart of the original PCP Theorem, he’s had some results in this area and also gave generalizations of the ReedMuller code that have various applications.
Alexander Golovnev talk was inspiring apparently he missed the memo that general circuit lower bounds are too difficult and should not be attempted by anyone, let alone a student. He and his co authors have obtained the strongest known lower bound for general Boolean circuits and he talked about a program to obtain stronger bounds via improved constructions of randomness dispersers.
Finally Mark Bun missed Bo’s joke, and despite graduating, was not afraid to declare his love for lower bounds. Like Steven Wu, he too works on differential privacy (which for some reason is highly correlated with Star Wars imagery on one’s slides). He worked on the lower bound side, with works answering the fundamental question of whether we need to pay a price in accuracy or efficiency if we want privacy.
If you are interested, please send me your information and presentation (see here for precise details how to do so). There is no deadline per se, but the presentations will be scheduled in the order of submissions so it’s “first comes first served”. I will also maintain a website with the photos and slides of all presenters.
Happy new year to everyone and hope to see you at ITCS!
The Brascamp–Lieb inequality is actually a shorthand for a broad family of inequalities generalizing the Holder Inequality, Young’s Inequality, the Loomis–Whitney inequality and many more. As Zeev put it, it probably generalizes any inequality you (or at least I) know about. Barthe gave an (alternative) proof of these inequalities, in the course of which he produced a magical lemma that turns out to be useful in many varying theoretical CS scenarios, from communication complexity to bounds for locally decodable codes, as well as for obtaining generalizations of the SylvesterGallai theorem in geometry.
One way to think about the BrascampLieb inequality (as shown by Carlen and CorderoErausquin) is as generalizing the subadditivity of entropy. We all know that given a random variable X over ℝ^{d}, H(X) ≤ H(X_{1}) + ⋯ + H(X_{d})
Where X_{i} is the i^{th} coordinate of X. Now suppose that we let denote the j^{th} coordinate of X in some different basis. Then by the same reasoning we know that
The BrascampLieb inequality can be thought of as asking for the most general form of such inequalities. Specifically, given an arbitrary linear map F : ℝ^{d} → ℝ^{n}, we can ask for what vector of numbers γ = (γ_{1}, …, γ_{n}) it holds that
H(X) ≤ γ_{1}H(F(X)_{1}) + ⋯γ_{n}H(F(X)_{n}) + C
for all random variables X where C is a constant independent of X (though can depend on F, γ).
The answer of BrascampLieb is that this is the case if (and essentially only if) the vector γ can be written as a convex combination of vectors of the form 1_{S} where S ⊆ [n] is a dsized set such that the functions {F_{i}}_{i ∈ S} are linearly independent. You can see that the two cases above are a special case of this condition (where it also turns out that C = 0). (There is an even more general form where each can map into a subspace of dimension higher than one; see the papers for more.)
The heart of the proof (if I remember Zeev’s explanations correctly) is to show that the worstcase for such an inequality is always when the variables F(X)_{1}, …, F(X)_{n} are a Gaussian process, that is X is some kind of a multivariate Gaussian distribution. Once this is show, one can phrase computing the supremum of C as a convex optimization problem over the parameters of this Gaussian distribution and then prove some bound on it.
For proving BrascampLieb it is sufficient to simply bound C, but it turns out that the parameters that achieve the optimum C as a function of F and γ, have a very interesting property. Since the derivative of the objective function at this point must be zero, some algebraic manipulations show that one can obtain from them an invertible linear transofmation G such that
∑γ_{i}G(F_{i})G(F_{i})^{⊤}/∥G(F_{i})∥^{2} = I (*)
where I is the identity map on ℝ^{d} and F_{i} is the ddimensional vector corresponding to the i^{th} coordinate of F. The condition (*) looks somewhat mysterious, but one way to think about it is that it means that after a change of basis and rescaling so that each F_{i} is a unit vector, we can make the vectors F_{1}, …, F_{n} be “evenly spread out” in ℝ^{d} in the sense that no direction is favored over any other one.
This turns out to be useful in some surprising contexts. For example, the notion of sign rank of a matrix is very important in several communication complexity applications. The sign rank of a matrix A ∈ { ± 1}^{n2} is the minimum rank of a matrix B such that sign(B_{i, j}) = A_{i, j} for every i, j. Clearly the sign rank might be much smaller than the rank, and proving that a matrix has large sign rank is a nontrivial matter. Nevertheless Forster managed to prove the following theorem:
Forster’s Theorem: The sign rank of A is at least n/∥A∥ where ∥A∥ is the spectral norm of A.
Which in particular gave a highly non trivial lower bound of on the sign rank of the Hadamard matrix.
The proof is easy given the above corollary (*). (Forster wasn’t aware of Barthe’s work, and as far as I know, the connection between the two works was first discovered by Moritz Hardt.) Suppose that there exists B of rank d such that sign(B_{i, j}) = A_{i, j} for every i, j. Then we can write B_{i, j} = 〈u_{i}, v_{j}〉for some vectors u_{1}, …, u_{n}, v_{1}, …, v_{n} ∈ ℝ^{d}. By making a tiny perturbation, we can assume that for every subset S ⊆ [n] the vectors {u_{i}}_{i ∈ S} are linearly independent and hence in particular the vector γ = (d/n, …, d/n) can be expressed as a convex combination of the vectors of the form 1_{S}, and hence in particular we get that after applying some change of basis and rescaling (which will not affect the sign of 〈u_{i}, v_{j}〉 ) we can assume without loss of generality that every u_{i},v_{j} is a unit vector and moreover .
Now note that under our assumption B_{i, j}A_{i, j} = B_{i, j} and hence
The last inequality follows from the fact that for every two matrices A, B, A ⋅ B ≤ ∑λ_{i}σ_{i}≤ max{λ_{i}}∑σ_{i}, where the λ_{i}‘s and σ_{i}‘s are the singular values of A and B respectively. We then use the fact that ∥A∥ = max{λ_{i}} and in B at most d of those are nonzero and hence by CauchySchwarz . However by the condition (*) Tr(BB^{⊤}) = ∑_{i}∑_{j}〈u_{i}, v_{j〉}〉^{2} = n^{2}/d, and on the other hand (since every entry of B is a dot product of unit vectors and hence smaller than one) ∑B_{i, j} = ∑_{i, j}〈u_{i}, v_{j}〉 ≥ ∑_{i, j}〈u_{i}, v_{j}〉^{2} = n^{2}/d. So we get
or n/∥A∥ ≤ d as we wanted.
Aided by some very generous gifts, Computer Science is on a growth streak at Harvard, and in particular there are some new opportunities in Theoretical Computer Science. We have positions in all levels, including graduate, postdocs, faculty and visitors/sabbaticals, see http://toc.seas.harvard.edu/positions
As in every year, we encourage students interested in theoretical computer science to apply for graduate studies at Harvard. Starting this year, we also have new postdoc positions. Both students and postdocs have an opportunity to both work with the strong (and growing) theory group at Harvard, as well as take advantage of the unique intellectual environment of the university as a whole and the larger Boston/Cambridge area.
We have several postdoc opportunities at Harvard Computer Science and affiliated institutions. In addition to those we are happy to announce the inaugural Michael O. Rabin postdoctoral fellowship in Theoretical Computer Science. Rabin fellows will receive a generous salary as well as an annual allocation for research and travel expenses, and are free to pursue their own research agenda with no strings attached.
Candidates can apply to all postdoc positions via a unified application process at https://academicpositions.harvard.edu/postings/6477
For full consideration, the complete application, including reference letters, should be submitted by December 1, 2015. Email theorypostdocapply (at) seas.harvard.edu for any questions.
Boaz Barak, Yiling Chen, Harry Lewis, Michael Mitzenmacher, Jelani Nelson, David Parkes, Yaron Singer, Madhu Sudan, Salil Vadhan and Leslie Valiant.
p.s. Harvard is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability status, protected veteran status, or any other characteristic protected by law.
Tip #1: Don’t get advice from the Internet.
Grad school, and more generally research life, can differ so widely based on the field, the university, and the personalities involved, that what is a sound strategy for one person can be a terrible idea for another. It’s always better to get advice from a wise and experienced person who knows and cares about you.
Given this advice, I wouldn’t blame you if you stopped reading this blog post at this point. In fact, you probably should. But I will offer some more tips just in case, though keep in mind that they may not be applicable to your situation.
Tip #2: Remember you have other options.
Much of the advice you see online (and maybe even get in person) will include a long list of do’s and don’ts and hoops to jump through on the way to the final goal of becoming a tenured professor. Now, from my personal experience, being a tenured professor is pretty nice, but if you are even considering a career in theoretical computer science, chances are you have the skills and talent for many alternative career paths that have a number of significant advantages over the academic life. The main benefit of the latter is that you get to set your own goals and not jump through other people’s hoops. Don’t lose track of that.
Tip #3: Research is hard.
Research is very hard, or at least it’s very hard for me. Doing research is often not so much about solving a fixed problem given to us, but about setting out goals and revising them as we learn more. As we do so, the pendulum often swings between the states where the current problem we’re trying to solve is trivial, impossible, already known, or not well formed. Most of the time though is spent staring at a brick wall, trying to think of some way to bypass it. (Metaphorically speaking, of course; what you’ll actually be staring at is an empty notepad and a cup of coffee.) An added difficulty is that all this hard work is often invisible, so you get mostly to observe other people’s successes and feel that you are the only one that is having such a hard time.
One of the common fallacies of beginning students is thinking that it’s all about innate talent. When you see people solve mathematical problems in seemingly no time, or hear stories about those geniuses that solved the main question of someone’s Ph.D thesis in a day, you may feel that success in research is out of your control. I have met many highly successful researchers over the years, and while some are insanely quick, others can take time to do even the simplest calculations. Success in research comes in a great many forms and people can have very different styles of work, personalities, types of questions they are interested in, and more. The only common denominator is that, as a rule, successful researchers are passionate about what they do and even those for which success seems to come “easily” actually work very hard at it.
Tip #4: You are your own boss.
As an undergraduate, you are used to getting feedback on how well you’re doing in the form of grades. Many corporations have periodic reviews and evaluations for employees. In the academic world, feedback is quite rare, and can come in varying forms. Part of this is because professors don’t like to have awkward conversations. But fundamentally it is because you are truly the measure of your own success. As a theoretical computer scientist, I don’t need my students to run experiments for me, write code, or even prove theorems. I view my role as truly an advisor to the student as they find out what they are passionate about and great at. The best ones eventually find their own “research compass” and set and pursue their own goals in a unique way.
Tip #5: Be a good boss.
Since you are managing yourself, you should try to do a good job at it. Here I cannot really give any general advice. Some students should be tougher on themselves, and push harder. Others are too hard on themselves, and anxiety and/or depression are quite common in graduate students (and beyond that). As I said, being a researcher, at any career stage, is a hard job. In theoretical studies, maintaining motivation is especially challenging, since one can spend days, months or more with no visible progress – there are no lines of codes being written, no data being collected. I would also caution against measuring progress by publications. My philosophy is that any day in which you learned something is a good day. Such learning can take many forms, including thinking about a problem, learning some new or not so new cool ideas by attending a seminar or talking to a colleague, reinventing old ideas, reading a paper, figuring out why some approach doesn’t work, and more. I’ve heard a talk by Sanjay Gupta in which he gave the following advice — “try to do every day one thing that scares you” – I think this applies to research as well.
Just try to spend a good amount of your time on the things that truly matter to you, and less time worrying about the job market, who got published where, or reading advice blogs from some random professors on the Internet.