Windows On Theory

Avi wins the Turing, TCS for all

Boaz Barak — Thu, 11 Apr 2024 18:41:04 +0000

Avi Wigderson has won the Turing award. I just can’t find the words on how happy this makes me, so won’t even try to write a post, beyond what I wrote on Twitter. Many have posted on Avi’s contributions, see Quanta, New Scientist, Scott’s blog, Lance’s blog, but to me, perhaps this tweet of Michal Feldman resonated the most. One of the first lessons I learned from Avi is one that I still remember. This was at the IAS, I think when I was visiting there in 2001. We were talking about a technical question, and I mentioned that an upcoming visitor is an expert on that topic, and so it might be good to collaborate. Avi said, “Yes, and he’s a great person, which is even more important”. With the passage of time, I realized how right Avi was in these priorities. And of course, Avi himself is the proof that being an amazing human and friend only helps in becoming a great scientist.

In the spirit of the humanity of science, let me relay the announcement of the workshop “TCS for all”, sent to me by Elena Grigorescu:

You are cordially invited to our TCS for All Spotlight Workshop! The workshop will be held as part of the 56th Symposium on Theory of Computing (STOC), in Vancouver, Canada! The workshop is open to all.

More information about the workshop is available here: https://sigact.org/tcsforall/. In particular, we would like to highlight the TCS for All Travel Scholarships (deadline April 28th) and a call for nominations for Rising Stars talks at the workshop (deadline April 28th). More information on those are below.

Hope to see you in Vancouver!

TCS for All Travel Scholarship:

TCS for All Travel Scholarships are intended for researchers at the beginning of their career. This scholarship is being made available for minorities in TCS, and anyone who identifies as such is welcome to apply; this scholarship is open to both US and international students. Preference will be given to students at the beginning of their studies. If we have sufficient funding, we will give awards to more senior students and possibly even postdocs.

To apply, you will need to fill out the following form by April 28th, 2024 (11:59 pm PDT) in which you provide basic information about yourself, an estimate of your expenses, and a brief statement:

Apply for a travel grant here.

In addition, you will need to have your advisor (or department head or other faculty mentor if you do not yet have an advisor) send a letter of support to tcswomen@gmail.com by April 28th. Your advisor’s letter should also describe the availability of other travel funds. Note for advisors: Specifics about alternative funding are very helpful. Statements like “funding is tight” are not very helpful. This letter should be sent with subject line “support letter for [your name]”. This is very important. Your application is not complete without this letter.

Late applications (after April 28th) will not be accepted. You will be notified about your status by May 5th, which is prior to the STOC early registration deadline and hotel cut-off deadline.

Notes: Receipts will be required for all travel awards, and reimbursements will be made after the conference. Food or visa expenses will not be reimbursed.

Nominations for Rising Star talks:

We invite nomination for speakers in our Rising Star talks at the TCS for All Spotlight Workshop at STOC 2024. To be eligible, your nominee has to be a senior PhD student with expected graduation no later than August 2025, or a postdoc in theoretical computer science (all topics represented at STOC are welcome), an underrepresented minority, and not a speaker at a previous TCS Women/TCS for All Spotlight Workshop. Preference will be given to speakers who are currently on the job market for postdoctoral/faculty positions, or who expect to be on the job market in Fall 2024.

You can make your nomination by filling this form by April 28^th.

STOC/Theory fest 2024

Boaz Barak — Fri, 29 Mar 2024 02:26:36 +0000

STOC/TheoryFest 2024 is in Vancouver, British Columbia this year, during the week of June 24-28. The registration website is now open at http://acm-stoc.org/stoc2024/registration.html

In addition to the STOC 2024 paper talks, the program features keynote talks by Michal Feldman, Jakub Pachocki, and Tim Roughgarden, and workshops on Algorithmic Problems in Modern LLMs, Extremal Combinatorics, Length-Constrained Expanders, Online Resource Allocation, and a special workshop on TCS Mentoring, Diversity, and Outreach.

Emergent abilities and grokking: Fundamental, Mirage, or both?

Boaz Barak — Sat, 23 Dec 2023 01:41:22 +0000

One of the lessons we have seen in language modeling is the power of scale. The original GPT paper of Radford et al. noted that at some point during training, the model “acquired” the ability to do sentiment analysis of a sentence X by predicting whether it is more likely to be followed by “very negative” or “very positive.” As they trained the model, at some point, the zero-shot performance “jumped” significantly. Of course, as Radford et al. noted, their approach required “an expensive pre-training step—1 month on 8 GPUs.”

Figure 1: Figure adapted from OpenAI GPT blog post (2018).

The GPT-2 paper showed the benefits of scale. As they say, GPT-2 was a scaled-up version of GPT with about ten times the parameters and ten times the data. This scaling up led to qualitatively new capabilities. In particular, not just better “zero-shot” performance but also the ability to generate long coherent texts (see below). And, of course, GPT-3 and GPT-4 showed even more the benefits of scale.

Figure 2: GPT-2 completion of text from the Open-AI blog post.

Figure 3: GPT 3.5-turbo-instruct completion of the same text

Figure 4: GPT-4 response to a prompt to write the same speech.

Wei et al. showed that such “emergent capabilities at scale” are common. The form of such “emergent capabilities” is that:

If we plot performance as a function of training compute (on log scale), then there is a regime where performance is no better than trivial, and then at some point, it sharply increases.
We don’t know how to predict in advance where that point will take place.

Figure 5: Figure 2 from the survey on “predictability and surprise” by Ganguli et al. The left panel is based on GPT-3 paper, the middle panel is based on Gopher paper, and the right panel is based on Nye et al, 21.

The “Grokking paper” of Power et al, also showed a dramatic improvement from chance to perfect on a synthetic setting of an algebraic task.

Figure 6: A deep network learning modular division displays a relatively sharp increase from trivial to perfect accuracy. Moreover, this transition happens far later for test accuracy than training accuracy. See Figure 1 of Power et al.

Are emergent abilities a mirage?

In a recent paper, Schaffer, Miranda, and Koyejo gave evidence that there may be a sense in which emergent abilities are “just a mirage” in the following sense. If we use a different metric for the task performance, then instead of an abrupt and unpredictable “jump,” we could get a smooth and predictable improvement.

Figure 7: Performance of adding two four-digit numbers as a function of the length of the output, as measured both by (left) accuracy: getting the answer precisely correct, and (right) edit distance: distance from the correct solution. The “soft” metric on the right panel shows a more gradual and predictable improvement than the “hard” metric on the left panel. Taken from Figure 3 of Schaeffer et al, 2023.

Our own paper (with Edelman, Goel, Kakade, Malach, and Zhang) showed something similar too for the task of learning parities, while Nanda et al used mechanistic interpretability to give a “progress measures” for the grokking example.

Figure 8: Adaptation of Figure 3 from Barak et al. 22. While error and loss drop abruptly, the weights on the relevant features in the parity learning problem show more gradual progress.

One way to think about this is with the following analogy. Consider someone trying to learn how to jump over a 1-meter hurdle.

Figure 9: Cartoon by ChatGPT (DALL-E) of an aspiring hurdle jumper

As the aspiring athlete trains, their jump height will likely increase continuously. However, if we measure their performance by their probability of passing the hurdle, we will see the “sharp transition” / “emerging ability” type of curve.

Figure 10: The height an athlete can jump grows smoothly with the amount of training, but the probability of jumping above a 1-meter hurdle increases sharply.

Indeed, in their beautiful paper, Schaffer, Miranda, and Koyejo show that many such “abrupt curves” disappear when changing the notion of metrics. Does this indicate that all emerging abilities are a mirage? I think the answer is no.

The reason is that for many real-world tasks, and in particular ones that involve reasoning, we need to solve multiple tasks successfully. Indeed, when generating a long “chain of thought,” we need to solve such tasks sequentially, and getting one of them wrong could get us completely off track. Schaffer et al. (Section 2) observe that the probability of success curve becomes much sharper when it corresponds to the AND of multiple events.

Figure 11: As the probability of a single coin to come out “heads” increases, the probability that all k coins come out “heads” sharply increases. (Top: X axis is single coin probability, Bottom: X axis is log[single coin probability] (= negative cross-entropy loss of (p,1-p) vs (1,0) ))

In many real-world tasks, we may not know how to “decompose” the complex task into a combination of simpler predictable components, especially not before we solve the task. So, even if we can precisely predict the loss of a model trained using N flops, we may not be able to predict which tasks it would solve beyond those solvable by an N/10 flop model.

Letter to the Harvard Corporation re Harvard president

Boaz Barak — Mon, 11 Dec 2023 02:49:53 +0000

I wrote this letter earlier today to the Harvard Corporation. I did not intend to make it public but was asked by some people to do so. I figured there is no harm in that. Many people that I respect are deeply disappointed at Harvard in particular and U.S. Universities at large, and are calling for the resignation of our president. I hope this letter explains why I do not share this view. See also my previous tweet. This is not because I think Harvard has no issues with anti-semitism, nor because I think it is a bastion of freedom of speech. I have been critical of Harvard’s administration before. I also do not approve of the “lawyerly” answers that the presidents of Harvard, MIT, and UPenn gave in Congress to what should have been an obvious question.

Reasonable people can disagree with me on what follows. As far as I remember, I have met President Gay only once and talked with her for less than a minute. I have no special “inside knowledge.” The following is my opinion, based on what I know as a member of this community, as to what is best for this University at this point in time.

Dear members of the Harvard Corporation,

I am a professor of Computer Science at SEAS and an Israeli-American. Over the last two months, I led efforts to make the campus safer for Jewish and Israeli students and promote understanding. These included leading an open letter and hosting a delegation of families and friends of the hostages to Harvard and the Boston area.

As America’s oldest and most prominent university, Harvard can be both a beacon of light and a lightning rod. Unfortunately, lately, we have been serving the latter role. This is partly due to our prominence and partly to our own failings. Harvard got the worst ranking in the Foundation for Individual Rights and Expression (FIRE) report for 2023. We have also had long-standing issues of anti-semitism on our campus, exacerbated since October 7th.

Three months into her tenure, President Gay was thrust into one of the most challenging situations in our University’s history. I have criticized her administration several times, but we should also acknowledge the positive aspects. The administration has been moving in the right direction, even though it is yet unclear if its efforts will succeed. Harvard has not seen some of the levels of disruptions and violence that have occurred in other institutions.

The American people’s confidence in higher education has been falling. They do not trust that we protect all speech, regardless of political valence, nor do they trust that we protect all groups. Against this backdrop, the words of university presidents claiming a commitment to free speech as a reason to avoid disciplining anti-semitic speech rang hollow.

And yet, university presidents are judged first and foremost on campus and not in Congress. President Gay faces significant challenges, rising from deep-seated issues that far precede her tenure. She should be given the opportunity to address these. I have faith that Harvard can be a beacon of light. We can show it is possible to offer robust protections for freedom of speech without allowing a “heckler’s veto” or bullying. We can show that our commitment to both freedom and inclusion is not ideological and does not discriminate between liberals and conservatives, nor between different groups.

I do not think the right move at this point is to call for President Gay’s resignation. Instead, I hope the corporation offers the president the support and the chance to address the significant challenges ahead and show to this campus and society at large that we can live up to our motto of “veritas.”

Best,

Boaz Barak

Director of Undergraduate Studies, Computer Science
Gordon McKay Professor of Computer Science,
Harvard John A. Paulson School of Engineering and Applied Sciences

Harvard, we have a problem

Boaz Barak — Fri, 27 Oct 2023 16:10:53 +0000

[Oct 27, 2023: I was hoping for this piece to be posted as an op-ed in the Crimson, since I really want to reach students that are well-intentioned but may not realize they are harmful. However, it was rejected and so I am posting this here].

[Update Jan 3, 2024: In light of my new op-ed on The Crimson, I decided to also post here a file of some of the anti-semitic social media posts of Harvard students with names redacted, as well as a video of Harvard pro-Palestinian protestors calling out the Arabic version of “from the river to the sea” which ends with “Palestine will be Arab”, with blurred faces. See also this X thread and my Dec 10 letter to the Harvard Corporation]

The events of October 7th, 2023, hit many Israeli-Americans personally. My immediate family was thankfully spared, but I have several friends who were not so lucky. At Harvard, where I teach, one student lost four members of her family; three generations: grandmother, father, and two kids were murdered. The Idan family were the next-door neighbors of another student’s family. Parents Roee and Smadar were murdered. Their children, Michael (age 9) and Amalya (age 6), survived by hiding in a closet for 14 hours. Little three-year-old Avigail ran out and is now a hostage in Gaza. As is well known, Hamas terrorists killed more than 1,400 Israelis that day. They gunned down hundreds of innocent partiers, raped women, tortured and mutilated their victims. Entire families were slaughtered in their homes. Hamas also took over 200 hostages, including mothers, babies, and the elderly.

The same night, more than 30 Harvard student groups issued a statement holding Israel “entirely responsible for all unfolding violence” without mentioning Hamas at all. The statement claimed that “massacres in Gaza” are taking place. The attack was still ongoing, with Israeli forces rescuing citizens held hostages in their homes, all under a barrage of thousands of rockets. This statement was equivalent to condemning the U.S. as entirely responsible for the 9/11 attacks and decrying the response while firefighters were still in the building. Not to be outdone, at Tufts, Students for Justice in Palestine praised Hamas’ “creativity,” and even added emojis of the paragliders used by the terrorists.

The Harvard statement justly drew criticism, including in a letter I co-authored and signed by more than 400 faculty members. An anonymous resolution condemning our letter was proposed to the Harvard Graduate Student Union and emailed to all writers (myself included). The resolution states that the goals of some (though not all) in the “Free Palestine movement” is the “total reconquest, total evacuation, or total eradication of the Jews,” and some pray for Israel’s destruction by an Iranian nuclear bomb. While the destruction of Israel and the death of all its Jews are also advocated by the Hamas charter, at least Hamas doesn’t have the gall to call itself a “human rights movement.”

If public statements were not bad enough, Israeli and Jewish students soon saw on social media that fellow students were not just condoning the murder of their family and friends but even celebrating it, often using classical anti-semitic tropes. One Harvard student posted how the “innocent festival goers line does nothing for me,” and they couldn’t “give a f**k” about them. I know four different Harvard students who lost friends in the Nova Festival massacre— can you get more callous than that?

On Sidechat, students posted “Let ‘em cook” and “gotta get them all” with the Palestinian flags; one even posted an emoji of a baby with a severed head. Another student wrote about “how much power the Jewish population has over the media.” One student posted that the “two Israeli defeats” in October (1973 and 2023) make it a “great month.” October 2023 might have been a great month for American keyboard warriors, but it has been and continues to be an awful month for both Israelis and Palestinians. While Israel has no choice but to defend itself against Hamas, the images from Gaza are heartbreaking. I am glad that (long overdue and too little) humanitarian aid is now going in. Israel would be wise to heed President Biden’s warning that it should follow the laws of war. But such criticism is best received when it is not a knee-jerk reaction from people who seek your country’s destruction.

Let me be clear: not all criticism of Israel is anti-semitism. You can harshly criticize Israel’s policies and even call for sanctions or boycotts without being anti-semitic. But you cannot condone, let alone celebrate, the slaughter of Jewish babies, mothers, and grandparents without being anti-semitic. While I would hope that basic human decency should be enough, the failure to condemn Hamas also hurts the cause of a free Palestinian state alongside Israel, a cause that I personally support. Furthermore, like the Taliban-controlled Afghanistan, a Hamas-controlled Palestine would not be truly “free,” especially for women, sexual and gender minorities, or anyone who does not conform to Hamas’ narrow definition of Islam.

These offensive social media posts are reminiscent of the famous 2017 Harvard “meme scandal,” but the current situation is much worse. These are not high-school students but supposedly adult college students. They are not making hypothetical jokes in a private group chat but celebrating the murder of actual people on a platform shared by the friends and relatives of the victims. Of course, there is another difference. While in 2017, Harvard had no compunction rescinding admittance for the students, now it is setting up a task force to protect students from any consequences to their careers. I have repeatedly voiced my disapproval of “shaming” students via (often inaccurate) blacklists or trucks. But the hypocrisy is palpable. How is it that on a campus so intent on promoting equality and inclusion, condoning the murder of people due to their ethnicity is even up for debate? Why do Harvard students, ever sensitive to the slightest micro-aggressions, feel OK to repeat anti-semitic stereotypes? How is it that some of the smartest and most educated people on Earth seem to have lost the ability to distinguish between supporting a cause and supporting all actions done ostensibly in the name of this cause?

Petition by CS & Math Laureates: Freedom for kidnapped children

Boaz Barak — Fri, 27 Oct 2023 05:57:01 +0000

[Guest post by Shafi Goldwasser, See also PDF version of document]

On the morning of Saturday, October 7, 2023, Hamas launched an attack near the Israel/Gaza border. In villages and towns near the border they went from door-to-door annihilating whole families. They killed children in front of their parents and siblings. They abused women. In a music concert in the desert, they massacred hundreds of young people and wounded thousands of others. And they took captive children, infants, elderly in wheelchairs, women and men, to be used as human shields, and worse. The attacks on civilians were planned, purposeful and documented by Hamas.

While we individually have different views about the Israeli- Palestinian conflict, we all emphatically condemn the murder, kidnapping, rape and torture of civilians. Throughout history, humanity has consistently drawn the line between acts of war and war crimes. No war should ever condone mass atrocities. No war should normalize acts of rape and torture. No war should permit the captivity of innocent young children.

The vulnerability of children in captivity is immense and unfathomable, even under the assumption that Hamas is providing them with elementary necessities. It is our duty to rally to help the innocent and the vulnerable and bring them home. We join the petition of the Nobel Laureates and call out urgently and without condition:

FREE ALL CHILDREN HOSTAGES NOW

TIME IS OF THE ESSENCE

COMPUTER SCIENCE AND MATHEMATICS PRIZE WINNERS’ SIGNATURES

Scott Aaronson (ACM Prize in Computing Laureate)
Manindra Agrawal (Clay research award, Fulkerson Prize).
Alfred Aho (Turing Award Laureate)
Sanjeev Arora (ACM Prize in Computing Laureate)
Artur Avila (Fields Medal Laureate)
Yoshua Bengio (Turing Award Laureate)
Charles H. Bennett (Claude E. Shannon Award, Breakthrough Prize in Fundamental Physics)
David Meir Blei (ACM Prize in Computing Laureate)
Manuel Blum (Turing Award Laureate)
Dan Boneh (ACM Prize in Computing Laureate)
Gilles Brassard (Breakthrough Prize in Fundamental Physics)
Mark Braverman (Abacus Medal Laureate)
Eric Brewer (ACM Prize in Computing Laureate)
Ed Catmull (Turing Award Laureate)
Vint Cerf (Turing Award Laureate)
Alain Connes (Fields Medal Laureate)
Jeff Dean (ACM Prize in Computing Laureate and IEEE John von Neumann Medal)
Simon Donaldson (Fields Medal Laureate and Breakthrough Prize in Mathematics)
Jack Dongarra (Turing Award Laureate)
Alexei Efros (ACM Prize in Computing Laureate)
Alessio Figalli (Fields Medal Laureate)
Michael Freedman (Fields Medal Laureate)
Hillel Furstenberg (Abel Prize Laureate)
Sanjay Ghemawat (ACM Prize in Computing)
Shafi Goldwasser (Turing Award Laureate)
Martin Hairer (Fields Medal Laureate)
John Hennesey (Turing Award Laureate)
Geoffrey Hinton (Turing Award Laureate)
June E. Huh (Fields Medal Laureate)
Frans Kaashoek (ACM Prize in Computing Laureate)
Yael Kalai (ACM Prize in Computing Laureate)
Alan Kay (Turing Award Laureate)
Richard Karp (Turing Award Laureate)
Daphna Koller (ACM Prize in Computing Laureate, ACM AAAI Allen Newell Award, MacArthur Fellowship)
Butler Lampson (Turing Award Laureate)
Robert Langlands (Abel Prize, Companion of the Order of Canada)
Yann LeCun (Turing Award Laureate)
Elon Lindenstrauss (Fields Medal Laureate)
Pierre-Louis Lions (Fields Medal Laureate)
Barbara Liskov (Turing Award Laureate and IEEE John von Neumann Medal)
László Lovász (Abel Prize Laureate)
Grigory Margulis (Fields Medal Laureate, Abel Priza Laureate)
James Maynard (Fields Medal Laureate)
Robert Metcalfe (Turing Award Laureate)
Silvio Micali (Turing Award Laureate)
Shigefumi Mori (Fields Medal Laureate)
Andrei Okounkov (Fields Medal Laureate)
David Patterson (Turing Award Laureate)
Shwetak Patel (ACM Prize in Computing Laureate)
Judea Pearl (Turing Award Laureate)
Michael Rabin (Turing Award Laureate)
Alexander Razborov (Nevanlinna Prize Laureate)
Ron Rivest (Turing Award Laureate)
Amit Sahai (National Academy of Sciences Held Prize)
Dana Scott (Turing Award Laureate)
Peter Scholze (Fields Medal Laureate)
Adi Shamir (Turing Award Laureate)
Dan Spielman (Nevanlinna Prize Laureate, Breakthrough Prize in Mathematics)
Endre Szemerédi (Abel Prize)
Terence Tao (Fields Medal Laureate, Breakthrough Prize in Mathematics)
Robert Tarjan (Turing Award Laureate, Nevanlinna Prize Laureate)
Ken Thompson (Turing Award)
Karen Uhlenbeck (Abel Prize)
Leslie Valiant (Turing Award Laureate and Nevanlinna Prize Laureate)
Maryna Viazovska (Fields Medal Laureate)
Cedric Villani (Fields Medal Laureate)
Avi Wigderson (Nevanlinna Prize Laureate, Abel Prize Laureate)
Efim Zelmanov (Fields Medal Laureate)

Open letter to the Harvard community

Boaz Barak — Wed, 11 Oct 2023 13:15:42 +0000

October 12, 2023

To President Claudine Gay and the Harvard University leadership

We attach an open letter for your consideration written on October 8, 2023 and signed by more than 350 faculty members at Harvard. The drafters of the letter (listed below) welcome President Gay’s statement unequivocally condemning Hamas’s terrorism and distancing the university from the abhorrent statement approving it which had been issued by some student groups. We hope that all the academic leaders at the university will echo President Gay’s clear condemnation of mass murder. The dismaying statement today by the leaders of Harvard’s Religion and Public Life program in the Divinity School shows that there is still work to be done.

We believe the attached letter sends an important message to the Harvard community and beyond that the university rejects any support of terrorism or false equivalences between the targeted killing of civilians and collective self-defense. Only after terrorism is rejected can dialogue on the best path forward begin.

Sincerely,

Boaz Barak, Gordon McKay Professor of Computer Science, SEAS
Jeffrey Flier, Professor of Medicine and former Dean, Harvard Medical School
Barbara Grosz, Higgins Research Professor of Natural Sciences, SEAS
Gabriel Kreiman, Professor, Children’s Hospital, Harvard Medical School
Steven Pinker, Johnstone Family Professor of Psychology, FAS
Rabbi David Wolpe – Divinity School

See also https://bit.ly/harvard-against-terrorism for a live version with the list of all signatories.

Open-letter-harvard.pdf Download

—-

We are faculty at Harvard who are deeply concerned about the events in the Middle East, as well as the safety of our students here on campus. On October 7th, Hamas launched a premeditated attack on the Israeli population. Hundreds of terrorists infiltrated Israeli towns and houses. Children were killed in front of their parents; entire families were executed. Grandmothers, mothers, and their babies were kidnapped. All in all, more than 900 Israelis were killed in a single day and the death toll is continuing to grow. There have also been deaths on the Palestinian side, including hundreds of terrorists and, tragically, civilians as well.

Every innocent death is a tragedy. Yet, this should not mislead us to create false equivalencies between the actions leading to this loss. Hamas planned and executed the murder and kidnapping of civilians, particularly women, children, and the elderly, with no military or other specific objective. This meets the definition of a war crime. The Israeli security forces were engaging in self-defense against this attack while dealing with numerous hostage situations and a barrage of thousands of rockets hidden deliberately in dense urban settings.

The leaders of the major democratic countries united in saying that “the terrorist actions of Hamas have no justification, no legitimacy, and must be universally condemned” and that Israel should be supported “in its efforts to defend itself and its people against such atrocities.“ In contrast, while terrorists were still killing Israelis in their homes, 35 Harvard student organizations wrote that they hold “the Israeli regime entirely responsible for all unfolding violence,” with not a single word denouncing the horrific acts by Hamas. In the context of the unfolding events, this statement can be seen as nothing less than condoning the mass murder of civilians based only on their nationality. We’ve heard reports of even worse instances, with Harvard students celebrating the “victory” or “resistance” on social media.

As a University aimed at educating future leaders, this could have been a teaching moment and an opportunity to remind our students that beyond our political debates, some acts such as war crimes are simply wrong. However, the statement by Harvard’s administration fell short of this goal. While justly denouncing Hamas, it still contributed to the false equivalency between attacks on noncombatants and self-defense against those atrocities. Furthermore, the statement failed to condemn the justifications for violence that come from our own campus, nor to make it clear to the world that the statement endorsed by these organizations does not represent the values of the Harvard community. How can Jewish and Israeli students feel safe on a campus in which it is considered acceptable to justify and even celebrate the deaths of Jewish children and families?

We recognize that Harvard has students and community members from all regions, including from the Gaza Strip. These are not easy times, and we pray for the safety of all our members and their families. The Israeli-Palestinian conflict has a long and complex history. We hold varying opinions, but none of us endorses all of Israel’s past actions. However, the events of this week are not complicated. Sometimes there is such a thing as evil, and it is incumbent upon educators and leaders to call it out, as they have with school shootings and terrorist attacks. It is imperative that our academic leadership, whose good faith we do not doubt, state this clearly and unequivocally. Further, while individuals’ free speech should be protected, our leaders should make it clear that our community rejects any statements that excuse terrorist acts.

We stand with any member of the Harvard community who feels unsafe or alone and pledge to do what we can individually and collectively. We hold our hope for better days in the future. To quote President Obama, “As we support Israel’s right to defend itself against terror, we must keep striving for a just and lasting peace for Israelis and Palestinians alike.”

Letter to Harvard President Claudine Gay

Boaz Barak — Mon, 09 Oct 2023 20:45:27 +0000

[Update 10/11: I have not received a response to this letter, but On the night of October 9th, Harvard’s leadership released a statement, and president Gay added her own statement on October 10th; you can read both here. These partially address what I raised in the letter, but still fall short of condemning, rather than only distancing from, the outrageous statement by 35 student groups.]

Dear President Gay,

Recent days have been harrowing for both me and every other Israeli or Jewish person on the Harvard campus. On October 7th, Hamas launched a premeditated attack on the Israeli population. Hundreds of terrorists infiltrated Israeli towns and houses. Children were killed in front of their parents; entire families were executed. Grandmothers, mothers, and their babies were kidnapped. All in all, more than 700 Israelis were killed in a single day and the death toll is continuing to grow. There have also been deaths on the Palestinian side, including hundreds of terrorists and, unfortunately, some civilians, none of whom were purposely targeted.

Every innocent death is a tragedy, period. Yet, this should not mislead us to create false equivalencies between the actions leading to this loss. Hamas planned and executed a terror attack that systematically targeted civilians and focused on women, children, and the elderly. The Israeli security forces were engaging in self-defense against this attack while dealing with numerous hostage situations and a barrage of thousands of rockets hidden deliberately in dense urban settings.

Against this backdrop, I was heartbroken to see a statement signed by 35 Harvard student organizations. Published while terrorists were still killing Israelis in their homes, the statement holds “the Israeli regime entirely responsible for all unfolding violence,” with not a single word denouncing the horrific acts by Hamas. In the context of the unfolding events, this statement can be seen as nothing less than condoning hate crimes against civilians based only on their nationality. Freedom of speech is an important principle, but at the very least, I would expect Harvard’s leadership to denounce this statement–just as they should denounce any statement condoning hate crimes by Harvard-affiliated organizations.

I was also dismayed by official communication from Harvard. In an email to the Harvard Graduate School of Education community, Dean Long wrote “to express concern in the aftermath of the deadly attacks in Israel and the Gaza Strip where it has been reported that more than 1,100 people have been killed and hundreds more wounded by the actions taken by Hamas and the Israeli government. “ Once again, this is a false equivalency between the terrorist action of Hamas and Israeli self-defense.

I recognize that Harvard has students and community members from all regions, including from the Gaza Strip. These are not easy times for anyone, and I pray for the safety of all our members and their families. We do not know yet how the situation will unfold. Israel has committed wrongdoings in the past, and I have publicly criticized its government for doing so. It may also deserve criticism on some future day. But today is not that day.

Boaz Barak

Gordon McKay Professor of Computer Science,
Co-director of Undergraduate Studies, Computer Science
Harvard John A. Paulson School of Engineering and Applied Sciences

Reflections on “Making the Atomic Bomb”

Boaz Barak — Thu, 17 Aug 2023 02:47:58 +0000

[Cross posted on lesswrong; see here for my prior writings ; update: 8/25/23: added a paragraph on secrecy]

[it appears almost certain that in the immediate future, it would be] possible to set up a nuclear chain reaction in a large mass of uranium by which vast amounts of power and large quantities of new radium-like elements would be generated.
Letter from Albert Einstein (prepared by Leo Szilard) to F.D. Roosevelt, August 1939

Do you know, Josef Vassarionovich, what main argument has been advanced against uranium? “It would be too good if the problem could be solved. Nature seldom proves favorable to man.”
Letter from Georgi Flerov to Joseph Stalin, April 1942.

I’ve heard great things about Richard Rhodes’ “The Making of the Atomic Bomb.” Finally, on vacation, I managed to read it. (Pro-tip: buy the Kindle version – the hard copy is far too big to lug around.) It’s as great as people say. Can’t recommend it enough. I can’t remember when, if ever, I’ve read a book that combines so well popular science and history. Indeed, the Atomic bomb is the one setting where the precise details of the smallest particles have profoundly impacted human history.

Here are some quick thoughts after reading the book. (Warning: spoilers below for people who don’t know how WWII ended.)

The level of investment in the Manhattan Project was truly staggering.

I knew it but didn’t fully grasp this. This is not just the numbers ($2B, which was almost 1 percent of GDP at the time) but also the project’s sheer size, employing more than 100,000 people, and the massive construction of buildings, factories, and roads at multiple sites. As just one example, when they didn’t have enough copper, the treasury department lent the project 15,000 tons of silver to be used in the electromagnetic separation plant (to be later melted and returned after the war).

Much of this cost was due to the compressed schedule.

The staggering cost was mainly due to the need to get the bomb done in time to use in the war. Time and again, whenever the project faced a choice between approaches A, B, or C, they chose to pursue all three in parallel, so if two failed, they could still go ahead. Whenever there was a choice between saving money or time, they opted for the latter. The fact that the cost was primarily due to time is also evidenced by the fact that, following the war, many countries could set up their own atomic bomb programs or reach the threshold of doing so at a much lower cost.

This seems to be a general principle in technological innovation: the cost of achieving a new advance decreases exponentially in time. Thus, achieving X transitions over time from being impossible to being inevitable. This is related to Bill Gates’ famous quote that in technology, we tend to overestimate progress in two years and underestimate progress in ten years. (See plot above)

The Manhattan Project was trying to achieve the Atomic bomb just at the cusp of it being possible. The project got going when General Groves was appointed (September 1942), and it took a little less than three years until the successful test (July 1945). Of course, they could have started much earlier: Einstein and Szilard sent their famous letter to Roosevelt in August 1939. The “impossible vs. inevitable” phenomenon is manifested in another way. The U.S. drastically underestimated how long it would take for the Soviet Union to achieve the bomb (even considering the Soviet advantages due to spying, which the Americans should at least have partially anticipated as well).

The government fully trusted the scientists on the science.

The project was authorized primarily based on pen and paper calculations. At the time the project was approved, no chain reaction had been demonstrated, and the total quantity of Uranium 235 and Plutonium was minuscule, with no proof that they could be separated at scale. The U.S. trusted the scientists when they said it was possible, even though they were also very honest about the uncertainties and limitations. (The government didn’t trust the same scientists on their politics and allegiances and continually spied on the very people that it placed at the helm of its most secret and expensive venture.)

Many scientists independently realized the possibility of an atomic bomb.

I didn’t know how many different scientists in Germany, France, Russia, Japan, Britain, and the U.S. all realized the possibility of a fission bomb once Otto Hahn and Fritz Strassmann observed fission experimentally in 1938, and it was analyzed theoretically by Lise Meitner and Otto Frisch. It truly was inevitable.

The project was secret to the public but public to the US’ adversaries.

While the Manhattan Project was highly classified, physicists in many countries, including Germany, Russia, and Japan, noticed the sudden disappearance of their American colleagues from the physics literature and deduced that the U.S. had a major Atomic fission project. The Russians in particular, had detailed information on the project from several spies, chief among them Klaus Fuchs. That said, despite Stalin knowing about the project, and authorizing a Soviet project already in 1942, seeing the Bomb used in Hiroshima had a profound effect on him, and caused him to massively increase resources and intensity.

Several scientists aware of the project, and in particular Niels Bohr, pushed on the one hand for increased openness in sharing general information about Atomic research between countries, and on the other hand for avoiding using the weapon except as a last resort. In retrospect, they were right on both counts. Secrecy on high-level information did not prevent the arms race, but the psychological effect of using the bomb spurred it on. More specific details, for example, the phenomena of “reactor poisioning” and the implosion mechanism, were very useful for Russia and by Rhodes’ estimates, probably sped up their project by two years. However, despite all the secrecy measures, including tailing Oppenheimer and other top scientists, US intelligence failed to realize the extensive spying that was going on under their noses.

Following World War II, secrecy probably had the effect of accelerating the arms race. Over-inflated estimates of Soviet capabilities were used by those who successfully pushed for the U.S. to undertake an “all out” program to develop the Hydrogen bomb.

How long was the road from theoretical possibility to actually building the bomb.

Saying the bomb was “inevitable” doesn’t mean that it was all just “engineering details.” The book details the many unexpected obstacles that the project encountered and the brilliant ideas they needed to overcome them. Just as a fissionable material can undergo a chain reaction once it is dense enough beyond its “critical mass,” so was the Manhattan Project’s success based on creating a density of talent that enabled a “critical mass” of exchanging ideas.

How terrible of a weapon the bomb is.

It is easy to be numbed by fatality numbers. Still, the book’s descriptions of eyewitness accounts from Hiroshima bring out the sheer destructiveness of the bomb, both immediately and in the aftermath. It is purely a device of death.

How little sense the H bomb makes.

Reading the accounts of the destructiveness at Hiroshima, the last question that comes to mind is whether we could build a bomb that is 100 or 1000 times more powerful. And yet, sadly, this is what humanity chose to do. This also brings home the point that, like it or not, apparently any war-making device that can be built will be built, even if its only possible use is the self-destruction of our race. To understand better how we’ve reached this point, I’m now reading Rhodes’ other book, “Dark Sun.”

Are there lessons for AI?

We are currently in another time of great scientific advances that promise to change human society. What are the similarities and differences between the Atomic bomb and Artificial Intelligence?

AI is not the bomb 1: AI is a dual-use technology

The uncontrolled release of vast quantities of energy cannot serve any purpose other than destruction. Thus the atomic bomb is an instrument of death. The more powerful the bomb is, the more dangerous it is. (Nuclear fission generally is, of course, a dual-use technology.) In contrast, AI is a general technology that can be used for many purposes. In fact, it’s hard to think of a goal that AI cannot benefit. This implies a fundamental difference in the risk profile of the bomb and AI. Unlike what is sometimes suggested, AI safety and capabilities are not fundamentally opposed to one another. In some cases, such as self-driving cars, it is clear that making AI systems more capable also means making them safer. In other settings, the relation is more complicated, where AI capabilities improvements help both the “attacker” and “defender,” and it is unclear which side is helped the most:

In cyber warfare, hackers can use AI systems to find vulnerabilities and hack software systems. But, by the same token, software vendors can use AI to patch vulnerabilities in advance before they are discovered and exploited. It is unclear which side is intrinsically more favored (assuming they both have access to AIs of equal strength). One major advantage AI offers for the defender side is coverage and timeliness. Currently, software vendors can spare limited effort toward securing systems, and this effort is often proportional to the amount that the system or individual component is used. Thus often, it is the more esoteric features (e.g., TLS heartbeat extension) that lack attention from developers and can have security vulnerabilities survive for an extended time.
A similar dynamic exists in biowarfare: AI can be used to design bioweapons, but also to prepare in advance vaccines, as well as accelerate vaccine and therapeutic developments for both natural and designed pandemics.
Disinformation is another setting in which AI can help both those who create or spread disinformation as well as those who try to prevent it. AI can make “deep fakes” and can also create highly personalized misleading messages. However, AI can also be used by social media companies to monitor misleading messages, whether on servers or on the client’s device. Hence, AI can detect deepfakes (whether by watermarking or using cryptography to establish a “chain of custody” and authenticity of the media) and warn users of influence attempts. Once again, it is unclear which side will be favored, and regulations that ensure the right set of incentives might play a considerable role.

Eliezer Yudkowsky likes to say that “every eighteen months, the minimum IQ necessary to destroy the world drops by one point.” I believe this quote is wrong on several levels. First, it fetishizes intelligence. People have been destroying large parts of the world since the days of Genghis Khan. The Manhattan Project did indeed require the work of highly intelligent scientists, but they were only one component of the project and were successful not just because of their intelligence. (One passage in Rhodes’ book describes how Oppenheimer was so successful as a lab director also because he was attuned to the emotional needs of the scientists, with one example being him taking the time for weekly meetings with Teller.) As the infamous Oppenheimer-Truman meeting showed (not to mention the actions of Hitler and Stalin), ultimately, it was not the most intelligent people responsible for the most destruction. Second, as I discuss above, computational capabilities have a dual impact, and it is unclear, to say the least, if technological improvements make the world safer or more dangerous.

AI is not the bomb 2: It is much more complex.

One of the striking facts in the book is the contrast between how hard it was to build the atomic bomb and how (reasonably) easy it was to predict it in advance and get order-of-magnitude estimates of its capabilities. Multiple physicists made pen and paper calculations of the critical mass of U235 and the size of a bomb. Before Plutonium was ever created, physicists could roughly predict its potential for fission. In contrast, our ability to predict the future capabilities of AI systems is very restricted. (See this post for my crude attempts at back-of-envelope calculations, sticking with qualitative predictions only.) We can predict near-term advances on particular benchmarks fairly reliably. But extending these predictions further out or trying to map out what these benchmarks will mean for real-world impact is still highly problematic. One worrying trend is that people sometimes replace principled analysis (as we had with fission) with opinion polls of scientists or other dubious calculations, which can give a false impression of quantifiability (“there is X% likelihood of outcome Y”) but are really “cargo cult science.”

AI is like the bomb: the impossible vs. inevitable phenomenon.

Like the case of the bomb, I think building AI systems, including AGI, will become significantly easier with time, and the plot above will also hold for AI. Hardware, engineering, and algorithmic insights will make AIs more capable and cheaper to build and deploy. In particular, I believe that we will discover more efficient ways to train AI systems than the current “brute force” approach of using O(N²) operations to train an N-sized model on O(N) pieces of data; since each data point, on average, contributes O(1) bits of information to the model, it should not require Ω(N) time to process it.

Being the first mover will require significant expenses but might also confer significant advantages. This does mean that probably any type of “pauses” or “delays” are unlikely to make a long-term difference but may create more of an “overhang” effect, where multiple parties (companies/countries) can achieve similar capabilities at about the same time. For example, if the Manhattan Project didn’t exist, the Atomic bomb would have been delayed by several years, but when it was built, it would have likely been done by multiple countries. Whether such a “multipolar” scenario is good or bad is hard to predict in advance. Ultimately what can be built will be built, and I believe regulations and policies can have a minimal impact on the capabilities of AI systems in the long run. But like in the nuclear case, the decisions of first-movers, regulations, and research investments we make today can profoundly impact humanity’s future trajectory. It was not pre-ordained that we would live in a world with more than 3,000 thermonuclear missiles ready to launch at a second’s notice. It was also not pre-ordained that 75 years after Hiroshima and Nagasaki, only nine countries would have nuclear weapons, and no such weapon has been used in war since. Technology might be inevitable, but the way we use it isn’t.

Cartesian Cafe podcast interviews me on cryptography

Boaz Barak — Sun, 06 Aug 2023 09:29:39 +0000

[Unrelated announcement: Yael Kalai, Ran Raz, Salil Vadhan, Nisheeth Vishnoi and I recently completed our survey of Avi Wigderson’s work for the volume on Abel prize winners. Given the breadth and depth of Avi’s work, our survey could only cover a small sample of it, but we still hope it can. be a useful resource for students or researchers in computational complexity.]

Tim Nguyen has a wonderful podcast called the “Cartesian Cafe“, where he interviews for an extended time people on a technical topic. Some people he interviewed include Scott Aaronson, John Baez, Tai-Danae Bradley, Sean Carroll, Richard Easther, Alex Kontorovich, Po-Shen Loh, Grant Sanderson, Daniel Schroeder, Ethan Siegel, Greg Yang, and John Urschel.

Recently, Tim interviewed me about cryptography. You can find the discussion on youtube, apple podcasts, or spotify. (In the video version, we used an ipad as a whiteboard, so that may be the most useful one.) A lot of it is based on my notes at intensecrypto.org.

Since I at least prefer reading to listening, I thought I would share a transcript (cleaned up using GPT-4) below. (The timestamps link to the approximate time in the youtube video.)

Boaz: 00:00:00 GPT and others have not touched cryptography at all. AI can do a lot of things that humans can do, but humans cannot break encryption schemes like AES and neither can AI.

Tim: 00:00:17 Welcome everyone to the Cartesian Cafe. I am your host, Tim Nguyen. Today, we’re very lucky to have Boaz Barak here with us. Boaz is a professor of computer science at Harvard University, having previously been a principal researcher at Microsoft Research and a professor at Princeton University. His research interests span many areas of theoretical computer science, including cryptography, computational complexity, and the foundations of machine learning. Boaz serves on the Scientific Advisory Boards for Quanta Magazine and the Simons Institute for the Theory of Computing and he was selected for Foreign Policy Magazine’s list of 100 leading global thinkers for 2014. Welcome, Boaz. How are you doing today?

Boaz: 00:00:59 Great. Thank you for having me.

Tim: 00:01:02 Thank you for coming here to have this conversation. Maybe we can start out by you giving us a little bit of background on your career trajectory. I’ve interviewed people who are researchers in both academia and industry, but you’re the first person who seems to have navigated back and forth between the two. So I’m curious about how that came to be and what perspectives you have about research in academia versus industry.

Boaz: 00:01:29 I didn’t intend to go to industry. I started my undergraduate studies doing math and philosophy and at some point I started with computer science, mostly because I thought it would be a good career. But then I fell in love with the more theoretical aspects of computer science. I also realized that working for a living might not be the best thing for me. I’m not that good at having a boss and taking orders. I was doing internships at IBM research and my mentor there suggested to me that maybe I should go to get a master’s degree. In Israel, a master’s degree was research focused and funded so you get a stipend. So I went to the Weitzman Institute and I think the first course I took there was cryptography by Oded Goldreich. I really fell in love with the subject and Oded became my PhD advisor. I transitioned to the PhD program and basically went on the academic route. I didn’t intend to move to industry, but for both personal reasons and because living in Cambridge was better for me than Princeton. I was also quite excited by the new lab that Microsoft Research was opening, the new Microsoft Research New England, which has a very academic feeling. It still does. So I was tempted by that and then enjoyed Microsoft Research where I liked it quite a lot in the sense that you get a mixture of people that normally would be in different departments in a university and were sitting one next to each other. So that lab had Economists, anthropologists and computer scientists. It was a lot of fun, but then at some point I did miss students and decided to go back to academia.

Tim: 00:04:10 I never thought of students being the main reason for going. But I guess there are many reasons. I figured maybe the intellectual freedom in academia and maybe the less freedom in industry might be the more motivating factor.

Boaz: 00:04:27 In Microsoft research, I have to say that they, I think also now, but definitely at the time that I was there, I had a lot of intellectual freedom. So I not, as far as I know, almost nothing that I did really helped Microsoft’s bottom line, and there wasn’t really, I was given a lot of freedom to do research on the topics that I cared about.

Tim: 00:04:56 Yeah. I actually, on that note, I feel like there are ways to mentor people in industry through internships and whatnot. But I guess it’s not quite the same as a formal supervisor over the span of five years and the close relationship, right?

Boaz: 00:05:11 Yes, exactly. So you kind of get a taste of that, but you don’t really get to see people grow. The nice thing about university is that you get to see. You have a lot of impact on people at the maybe most important stages of their career and life and both on the undergraduate and the graduate level. And it’s, and you also learn from the students that I find that, I feel like it keeps me younger. So I like this environment.

Tim: 00:05:40 If you had the same level of supervising ability at Microsoft research, would you have stayed at Microsoft research? Was that really the main breaking factor?

Boaz: 00:06:01 I think, maybe something that, I, I do, I think there is a nice thing about. So at Microsoft Research, I really wasn’t pressured to contribute to Microsoft, but I felt a little bit guilty about that because in some sense, I am part of this company and I don’t feel like I’m contributing as much to this company as someone that is shipping software, because that’s really what’s the main mission of the company is. And although my manager was very good at shielding me for that, just personally, I felt a little bit, you know, like I’m not as contributing when, while here, the nice thing in the university is that the thing that I care about, which is research and education are the goals of this university. And if there is a mismatch in the university is not behaving according to these goals of promoting research and education, they should feel guilty. Not me.

Tim: 00:06:59 I see.

Boaz: 00:06:59 I see. That’s right.

Tim: 00:07:02 Yeah. That’s interesting. You said, because I guess, you know, if you’re in industry, if you’re in the right situation, you can be aloof and do whatever you want. But I guess what you’re saying is that even if that’s the case, if there’s any sign of misalignment between the company mission and what you’re doing, that creates enough of a psychological cognitive dissonance that you’re better off being in an aligned situation.

Boaz: 00:07:26 Yeah, I think so. maybe just me, but I feel like, you know, here, I kind of like the fact that this is basically, you know, who knows not, not in no organization ever behaves exactly based on following its stated mission. But at least I like the fact that I’m aligned with the stated mission.

Tim: 00:07:45 What about this sort of myth slash fact about professors nevertheless being beholden to chasing grants and things like that. So that there’s limitations in terms of freedom for that purposes, whereas inside a company, you’re much more well equipped with resources, at least if you’re aligned with the company, right. Do you have any thoughts on that?

Boaz: 00:08:08 I think, definitely professors. I don’t view, grants. Yeah. Like it’s not fun. I actually have a grant that I need to, I have a grant proposal I’m working on when I’m going to submit in four days. I couldn’t say that it’s my favorite thing to do, but, actually like the, the type of grants I tend to pursue are from organizations where that, like the National Science Foundations or Simon’s Institute, which I’m very grateful for because they are, they are relatively flexible. They want to kind of a broad outlines of the projects, but don’t ask you for, you know, deliverables. And they understand that when you’re doing basic science and curiosity driven research, then you might not do exactly the thing, the thing that you come and produce at the end might not be exactly the thing that you planned, maybe even better, but it’s often could be different. And so, while it’s never fun writing grant proposals or grant reports, I don’t view this as the main disadvantage. like the one thing is that somehow, you know, the students, is a blessing and a curse because in a sense it’s a blessing because I really like working with them, but it does take a lot of more of your time and I feel like I’m, you know, probably much busier today than I was in Microsoft Research, because they’re students, you have to write them recommendation letters, you, you know, you serve on committees and, and, and all of those, things. So, but, yeah, but I, I don’t, yeah, maybe if I had like a very large, the type of research that they do is more like in small groups. If I had like a, a huge lab with, you know, 40 students and five postdocs, then probably I would have had to invest much more in, constantly writing grants.

Tim: 00:10:07 I see. Okay, great. Maybe let me one more, biographical question. So you’re, you’re an Israeli, right. And, therefore, you had to do some military service as required, for, for males, in Israel. Okay. Would you like to comment on that? What was, what was the experience like, and what effect did it have on you in terms of your outlook on life, both inside and outside of science?

Boaz: 00:10:30 So, I, I was very much not into, the military as a kid. I’m. And basically, any kind of, in Israel in the Israeli military to do things that are kind of in the intelligence more advanced that take kind of you take advantage of your academic skills or sometimes that you could study in a university before you go to the military. But all of those things require you to sign up to do more military time than, than the three years, which was already much more than I would have liked to do. So I didn’t sign up for any of that. And because of my bad eyesight, I also wasn’t, eligible to be a combat soldier or something like that. So it was basically a clerk and, and, and yeah, I guess. I kind of really learned about myself again that I’m not so good at taking orders and that so probably the military is not for me and, and that I, by the time these three years were over, I was really, really, ready to do something intellectual and actually use my brain.

Tim: 00:11:52 Okay, so you couldn’t pull a, pull an Einstein and be a, a, a clerk with, with, you know, working on the foundation of physics. Uh, you know, in your drawer or

Boaz: 00:12:01 something like that. It was more like, basically the military human resources things. But sometimes you’re like, you’re a clerk, but you bring this clerkship in there. In a tent in the middle of the desert.

Tim: 00:12:12 Okay, you know, because there are these rare stories where people did creative work in the military, right? Like, like, for example, Schwartzschild, Schwartzschild physicist found the first solution to Einstein’s equations during his military service, right?

Boaz: 00:12:27 I guess there are smarter people than me. I also, I didn’t really do any kind of academic courses or something before. The military. So, you know, I had the high school education, but I really got in my main learning about science, etc. University level courses was only done at university after the military.
Tim: 00:12:49 I see. Okay, great. Well, thanks for sharing the details about that. Anyway, let’s get started with the main conversation of today. Boaz, you’re a world expert in cryptography. That’s what we’ll be discussing today. I’ve never studied the subject until preparing for this podcast. Fortunately, your book on the subject helped me prepare. It’s a fascinating subject. Can you give us your overall thoughts? A bird’s eye view of the subject before we dive in. What’s interesting about it to you? How do you think about it?

Boaz: 00:13:32 There are several things that make cryptography different from other branches of science. One of the things is that cryptography has a long history of being broken. People tried to design encryption systems and didn’t realize there were ways to break them. Those systems eventually got broken. But once we moved to base cryptography on a more mathematical basis, the history of cryptography became something else. We could do amazing things, not just design encryption schemes that no one knows how to break, which we did, but also do things like public key encryptions, fully homomorphic encryption, zero knowledge proofs. These are things that people didn’t even imagine.

In my book, I write that if you’re talking about going to the moon or a heavier than air flight, these are things that appeared in science fiction, and people dreamt about before they could do this. But public key cryptography seemed so crazy that I think no one even dreamt that they could do it. The same goes for fully homomorphic encryption, zero knowledge proofs, and other things in cryptography. A lot of the time, what we learn in cryptography is to dare to imagine that something is possible and to believe that if something is not proven impossible, then it might actually be possible.

It’s a little bit like what’s going on now with deep learning. It used to be the case that most things don’t work, and the default assumption is that nothing works. Now suddenly we have switched into the case that everything works. Everything surprises us for the better. The main way to fail is a failure of imagination, not daring to do something because you don’t believe that it will work.

Tim: 00:15:56 I see. So overall, cryptography is an optimistic field?

Boaz: 00:16:02 To some extent. It’s a combination because you have to be optimistic in the things that you can construct. The base assumption of cryptographers is that if something is not proven impossible, then it’s possible. But on the other hand, you have to have a very pessimistic mindset and think of the adversary and rigorously define what it means to be secure and how your systems can be attacked. So cryptographers oscillate between imagining amazing things and trying to break them and finding very subtle ways to break them.

Tim: 00:16:41 Yeah. It’s very much like a game theoretic way of thinking. Why don’t we start out by actually giving my take of a bird’s eye view, and then you could comment on that. We could start writing down a summary of what we’re going to talk about and see how far we get with that.

I think the setting we’re in is we’re concerned about privacy and security, specifically in the context of communication. Privacy and security can be thought of in other ways. If you’re a hermit and talk to nobody, then you’ll have privacy. And for security, you can hire a bunch of bodyguards. But that’s not the kind of security and privacy we’re interested in. We’re trying to communicate messages.

Boaz: 00:17:41 I would say it’s now beyond just communication, but also computation.

Tim: 00:17:45 Okay. We’ll see why that’s a qualification. Privacy is secure communication. And then I would think of security as protection from observation, theft, attack, etc. Anything that would compromise that communication. That’s the setting in which we’re working.

Boaz: 00:18:15 You might want to change “secure” to “secret” because “secure” seems like “security”.

Tim: 00:18:19 Confidential. And the reason why this is important, even if it’s obvious, is that in the modern world, we’re one large communication network. We need to preserve the integrity of the system because there are so many interactions that occur. That’s why cryptography is so important. Do you have anything to add to my amateur summary here?

Boaz: 00:18:43 No, I think you’re right. One thing that we should really emphasize is that cryptography is really important, even if you have nothing to hide. Even if you say, “I don’t understand, my life is an open book. I don’t do any crimes. I don’t do anything that needs cryptography.” But even if you say that, you definitely care about, say, your bank account and you would like to keep its content to yourself rather than someone else. You’re using various devices that you don’t want to have malware or ransomware installed on them. So you want that when you get an over-the-air update, then it will be authenticated. So you’re assured that it actually came from the company that you trust. When you browse the web and you go to a website like CNN and you see the news, you want to be sure that you actually see the right news rather than someone sitting on the router between you and the CNN server modifying the webpage and showing you something else. So all of us, even people that have no secrets, need cryptography in today’s world.

Tim: 00:20:31 Right. And I guess that falls into what I wrote here in terms of security, because you’re preserving the integrity of the communication channels. Okay, great. So why don’t we start writing an outline of what we hope to cover. It’s pretty ambitious, but that’s always good to be ambitious. I think the first thing we’ll talk about is a warmup and we’ll play with some examples just to whet people’s appetite and get a feel for why cryptography is hard. Then we’ll go into the first major subject in cryptography, which is private key cryptography. Here, the essential feature is that there are shared keys. Just like an ordinary key to a door, there’s one key that works, whether it’s you or me. From having studied your book, I think the main result that enables cryptography here and elsewhere is something called the cipher conjecture.

Boaz: 00:22:06 The most standard name for this is the existence of one-way functions. This is known to be equivalent to the existence of pseudo-random generators.

Tim: 00:22:26 I see. Okay. And I guess we’ll get there, but I think the key thing here is that this result and many others that are either equivalent or consequences thereof fundamentally rely on the status of whether P is equal to NP. That’s the whole business kind of hinges on this. And then from there, most of the heavy lifting is done once you’ve studied private key cryptography and then public key cryptography is, it’s of course, very maybe not intuitive or surprising, but a lot of the work has already been done once we’ve done private key cryptography. Is my sort of sense. I don’t know if you have any thoughts on that, but this is where now the keys are asymmetric, let’s say. Anything to add to the outline so far?

Boaz: 00:23:31 That’s good. I think we’ll mention the one-time pad when we talk about private key cryptography. I agree that in terms of definitional work and understanding the notions of security, a lot of the work is done in private key cryptography before you get to public key. In terms of the actual mathematical constructions, they tend to be quite different and they use different types of math.

Tim: 00:23:59 Yeah, but in terms of just the fundamental notions of security, it’s sort of done with the private key setting. And then to wrap it all up, we can talk about applications. Things like Bitcoin, authentication that you already mentioned briefly, and possibly applications to machine learning. I think we have a lot of topics here. Let’s see how far we get. Okay. This is now our warmup here. I could talk about it, but I guess maybe you want to go ahead and talk about what this contraption is all about and how it’s used to encrypt and decrypt messages.
Boaz: 00:24:41 Sure, let’s start with the basics, such as substitution ciphers. This is perhaps the first thing people think of when trying to encrypt messages. It’s essentially a table where each letter gets mapped to another letter. For example, A might get mapped to H, B to I, and so on. This was likely the first type of cipher invented and reinvented throughout history. Both parties would have a table like this, sometimes using a short phrase or a number offset to remember it. They would then translate their message according to the table, replacing each letter with its corresponding one. However, this cipher can be broken by frequency analysis. For instance, in English, the most common letter is E. So, if you see a lot of Ls in a long enough text encrypted by a substitution cipher, you might guess that L corresponds to E. You can then use other statistics and pretty soon, you can break the cipher and recover the message.

One historical example is the sad story of Mary, Queen of Scots, who used a substitution cipher in a plot to free herself and assassinate her cousin. The cipher was broken by frequency analysis, leading to her execution.

The Vigenere cipher is a more complex version of a substitution cipher. It’s like having a collection of substitution ciphers. The first letter uses one table, the second letter might use a different table, and so on. However, you can’t have an infinite number of tables, so at some point, you reuse the tables. If you have five tables, then every fifth letter uses the same table. This can be enough to erase the statistics, making frequency analysis much harder.

Tim: 00:29:32 What year was this?

Boaz: 00:29:34 Babbage cracked the Vigenere cipher in 1854 but didn’t publish it. In 1863, Kasiski broke the Vigenere cipher and did publish it. The Confederates used this cipher during the Civil War, thinking it was unbreakable. However, it can be broken by guessing the number of tables or the length of the key, then doing frequency analysis on every Nth letter where N is the length of the key. This method was used by Union officers to routinely break Confederate communications. A similar story occurred 80 years later between the Nazis and the British during World War II with the Enigma machine.

Tim: 00:31:32 Can you say a word about that? Are you saying that the same method of statistical analysis allows you to reverse engineer the encryption protocol?

Boaz: 00:31:49 The Enigma was much more complicated. It was like a typewriter where every time you press a key, a set of rotors move that determines the permutation between inputs and outputs. The number of permutations is immense, so you really had to understand the structure of the Enigma to break it. The Polish intelligence made the first breakthroughs, and then the British took it to the next level, mechanizing the process at Bletchley Park with Alan Turing heavily involved.

Tim: 00:33:19 Did they actually have to obtain a physical copy of the machine or was it purely mathematical analysis?

Boaz: 00:33:31 At some point, they did obtain some physical copies. The Germans switched machines or settings during the war, making it more complex. The Enigma was based on a cipher constructed before the war by some Germans, so the Allies had some idea of its operation.

Tim: 00:34:14 That’s a very interesting story.

Boaz: 00:34:18 If you’re interested in the history, there’s a great book called “The Codebreakers” that covers the early history of cryptography.

Tim: 00:34:31 Now that we have this warmup example of an attempt at a secure way of sending messages, we can see how difficult it is to come up with a secure encryption system. Like many things in mathematics, one of the hardest steps is to find a good definition. So let’s walk through the logic of trying to find a good definition of security.

Boaz: 00:35:21 Before we talk about security, let’s define what a mathematical object an encryption scheme is, at least a private key encryption scheme. We have two objects: the encryption algorithm and the decryption algorithm. The encryption algorithm takes a plaintext and a key and transforms it into a ciphertext. The decryption algorithm takes a ciphertext and a key and maps it back into the plaintext. We require this notion of validity, which means that if we encrypt a message with a key and then decrypt it with the same key, we get the same message back.

Tim: 00:38:43 Right, and just to clarify, in the case of the Vigenere cipher, the key is the collection of tables used for encryption. It’s necessary that the keys remain private because as soon as the key is exposed, the algorithm is exposed.

Boaz: 00:39:08 Exactly. Now that we have a definition of what an encryption scheme is, a pair of functions where encryption maps a plaintext and a key into a ciphertext and decryption maps a ciphertext and the key back into a plaintext, we can start talking about what it means to be secure.

Tim: 00:39:33 Sure, and just to reiterate, this is private key cryptography. It’s clear from our recurring example, the Vigenere cipher, that you want to keep the key private. As soon as the key is exposed, you’ve exposed the algorithm. So it’s absolutely necessary that the keys remain private.
Boaz: 00:39:58 This might be a good point to mention what parts of this scheme are secret and what parts are public. The only thing that’s secret is the key. This is a principle in cryptography, which basically says that we cannot hope for security by obscurity. If we are going to use encryption for an extended period of time, the only thing that we can refresh is the key. We should not assume that our adversary doesn’t know the details of our algorithm, because eventually the adversary will learn the details of our algorithm. I’ve been told that one version of this was that in the NSA, they had this saying that basically you can assume that whenever we build an encryption device, the first copy of it is shipped to the Kremlin. So, the system should be secure under the assumption that the only part of it that is not known to our adversary, the one that’s trying to break it, is the key. The key is chosen at random. We fix everything else. We fix the encryptions, we fix the decryption, we are given the message that we want to transmit, but the key is chosen completely at random and then shared by the two parties because this is a private

Tim: 00:41:54 key encryption. Right. I guess the analogy would be like a password, right? You don’t want your password to be a function of some obvious attributes of you or just some globally known constant like password, right? It should be random.

Boaz: 00:42:08 The password should be random. And you don’t want the security of your system to depend on the fact that your adversary doesn’t know if you’re using Chrome or Safari. You should assume that whoever is trying to attack you knows which computer you have, knows which browser you have, maybe knows what time of day you usually log in into your account. So those things are not secret. The only thing that protects you is the password.

Tim: 00:42:41 Right. And going back to this optimistic versus pessimistic, you know, I think that the thing with cryptography is that you basically have to always steel man your opponent, right? And that’s something that outside of cryptography, that’s something people aren’t typically good at, especially in politics. But in cryptography, the key is to steel man your adversary. And in this case, it basically means they are as smart as you because they have exactly full knowledge of your algorithm. The only thing they don’t have is ownership of the key. So basically it’s privacy through ownership of who owns the key.

Boaz: 00:43:13 Yes, yes. And in fact, in cryptography, we often kind of super steel man your opponent in the sense that you assume that your opponent has much more computational power than you, because if you think about actually using cryptography in practice, every time we go on the web, we are using encryption and decryption. So this has to be very computationally efficient operation. Encryption decryption itself has to be done very efficiently because we do it all the time. But the adversary might only be interested in a very particular message so they can invest a lot of computational resources into breaking that message. So we often want breaking the encryption to be much, much more difficult than using the

Tim: 00:44:08 encryption. That’s a good point. Yeah, you are super steel manning them. Okay. All right. Well, I guess we’ll see that soon enough. Okay, great. I think we do have the foundational definitions down now. I guess maybe we can start going into the notion of security that you alluded to, right?

Boaz: 00:44:24 Okay. Great. So. Just so we remember the notation and now maybe I’ll ask you, how would you define security?

Tim: 00:44:35 Let’s see. So I did look at your books. I’m not a completely naive person, but I know in your book, you basically went through something like three putative Definitions of security, and we could just kind of go through that and I could play sort of like the ignoramus going through each one of those. So I think I’ll call this like first attempt. Maybe

Boaz: 00:44:56 actually like, even before that, maybe we should introduce our cast of characters, Alice, right? So, we’ll have Alice, Bob and Eve and Alice and Bob share a key and Alice has a message she wants to send to Bob. So Alice is going to send the encryption with the key of X. So Alice knows X, and Alice going to send a message, encryption with the key, of x and the goal is that Bob can of course decrypt it, but the goal is that Eve doesn’t learn about the message. Doesn’t cover the message. So this is basically our cast of characters and now we can try to define what it means to be

Tim: 00:45:52 secure, right? So basically Eve can see everything, right? So Eve can see why the encrypted message, but, but nothing else, right? And also Eve has full access to the function capital E as well.

Boaz: 00:46:06 Eve knows E, knows D. And n y?

Tim: 00:46:15 Yes. Yes. Okay, great. I like the naming. So like, you know, I think, I actually know the physicist also, like Alison and Bob, right. A and B. So maybe that’s inspired from that and then Eve like eavesdropping, right. So it’s, it’s very, it’s, it’s a good naming. Okay. Anyways, so, okay, so let’s, let’s do the first attempt, according to your book, at least. So the first attempt was something like the following. It’s hard to write the full thing in all mathematical notation, but hard to recover the key, right? So in other words, basically the probability of, you know, guess the key should be basically two to the minus N basically there, so let’s back up. So, so the keys we said were always in, this, the set of N, possible bits. So there’s two to the N, strings here. And our first attempt is to say, your, your scheme is secure if. You have, you can’t do better than random guessing to figure out the key. Right. I think that was your, your first definition. Yeah. Uh huh. Um, now let’s take a look at this definition. Um, I think that, I think the, the peculiar thing about this definition is that it says nothing about recovering the message. Yes, I think like the original intent of security was to hide the message, of course, you want to hide the key to so I’m just sort of like looking at this as you know, with attempted fresh eyes, there’s something a little bit funny about this definition in terms of keys and messages. I don’t know if you want to follow up with that, but that’s like sort of the first funny thing about this definition.

Boaz: 00:48:10 Yes. Yes. So, so in one thing, like the definition seems like, you know, seems like basically the precise number two to the minus seven, but somehow it does seem like some kind of a necessary condition for security in a sense that, because we know that if you do recover the, the, the key. Then you can decrypt the and recover the plaintext, but it’s a necessary condition. It’s not a sufficient one. And in some sense, this is a kind of a counter example would be the following a think of a encryption with a key K of X. As just being X and decryption with a K with the K, the key K of Y just being Y. So just the identity function. So this is a valid encryption and decryption because whenever you encrypt and decrypt you get back the same thing. And it satisfies this definition because you give no information at all about the key you so so if would not be able to guess it’s better than random chance, but it is obviously completely in the key insecure. You’re just sending the message in the clear this is the least secure encryption possible. So, so that shows that this is, this is maybe a necessary condition, but it’s not a sufficient one.

Tim: 00:49:33 Indeed. Indeed. Um, yeah, I, how do I say, this is a very common thing in mathematics. You, you, you try to write a reasonable definition and then you find the, the, the most trivial counter example, right? The, the edge case in this case, and this is like a very extreme example, but it does highlight that. The reason why this is a bad definition, even if it might not be obvious at first, right? Yes. Great. Great. Okay. So now we know that this first attempt is no good. Now from this lesson of the failure at the first attempt, let’s try to do a second attempt. So the second attempt is basically, let’s just, replace the, the key in the first attempt with the message, right? So it should be hard to recover, any message. Is has L bits and so I guess, what the probability that Eve can guess a message, is what basically two to the minus L, right? Random guessing.

Boaz: 00:50:48 Oh, let’s say even, we can even be, we, we, we see like we can even say two to the minus n, would, would still be… Or should

Tim: 00:51:18 we, or should it be, maybe, are you saying it should be the minimum of n and L? With that, is that what

Boaz: 00:51:26 you’re trying to say? Yeah, but let’s,

Tim: 00:51:29 Yes. Yeah. that’s sort of reasonable that L is like, intuitively the set of keys, your key should be shorter than your, than your message. So

Boaz: 00:51:37 one observation can always, guess message. With probability 2 to the minus n, and that is basically by guessing the key and then decrypting. Ah,

Tim: 00:51:56 that’s right. Basically, this is try all possible keys.
Boaz: 00:51:59 Yes, trying all possible keys or selecting a key at random is the best we can hope for, with a probability of 2 to the power of -n. This is acceptable, even if it was 2 to the power of -n/2, because these are exponentially small probabilities. We have no choice but to be comfortable with this. Even if the key is moderately long, like 128, the chance that a key is say 2 to the power of -128 is larger than the number of milliseconds that have passed since the Big Bang. And in one of those milliseconds, there was the meteor that destroyed the dinosaurs. So in some sense, this is a high probability.

The first definition was too weak. Now, this definition is too strong or impossible. Let me explain why it’s impossible. There is something that’s always useful in mathematics, but really important in cryptography, which is whenever you write a probability statement, you have to know what is the probability taken over? What is the randomized experiment or the sample space that this probability is taken over?

And here, this basically has to be probability over choice of key. That’s the random part. But the message itself, we didn’t describe any probability distribution over messages. So it kind of doesn’t make sense. What if the message was a random English sentence? Then, the probability that Eve can guess a random English sentence will not be two to the power of -128, right? It’ll be much higher.

Tim: 00:55:07 Okay. I see. If you have a bound on the message and it’s not too large, then it’ll be higher.

Boaz: 00:55:14 To make things formal, we need to do two things. One is to define a prior distribution on the messages, because otherwise the probability doesn’t make sense. And the second thing is to replace 2 to the power of -n with something that has to do with the prior.

Tim: 00:55:48 So you’re saying that the problem with this second attempt is it has nothing to say about the distribution of messages.

Boaz: 00:56:00 Yes. What we want is some kind of experiment like the following: X is a sample from some probability distribution M on messages. And then, we could hope that a probability over both the choice of X from M and the choice of key from 0 to 1 to the power of n that Eve guesses X is at most the best possible guess for Eve. This is the definition that Shannon came up with.

Tim: 00:57:09 I think it might be helpful to make this a little bit more concrete.

Boaz: 00:57:40 Here is a special case. Where I say M is just two messages, just the uniform distribution on two messages, X0, X1. What we want is that the probability that X, when it’s chosen at random between X0 and X1, and the key is chosen at random in Z1 to the N, that given encryption K of X, B equals XB, so B guesses the right thing. Maybe X is at most half. So basically if Eve knew that the message is either attack or retreat with probability half, then the posterior probability of Eve after seeing the encryption has not been changed.

Tim: 00:58:58 Just one technical point, should this be equal to a half just by symmetry?

Boaz: 00:59:02 Yes, you can always guess half. So, for any reasonable Eve it would be exactly half.

Tim: 00:59:19 Got it. This is just to allow for any suboptimal strategy. Now I’m recalling why that previous definition was bad. The problem there was that we didn’t quantify over the space of messages. And as you explained in your book, you could just get lucky with your decryption scheme. The equivalent of this would be like, you have a password and I could just guess your password and it has nothing to do with the keys or how hard the encryption or decryption scheme is. If my strategy is, I’m going to guess that your password is password. If you happen to have the password password, then I win. And there’s nothing else to say about that, right? So there has to be some notion of randomness in the messages.

Boaz: 01:00:26 Yes. And it’s also important in the sense that we have a universal quantifier over the probability distribution of messages. That basically means that we don’t know what is the prior information that our adversary has maybe through some other means about the messages that we may send. So, if the encryption scheme had perfect secrecy, then any prior knowledge would be useless. You would not, no matter what prior knowledge you had about the distribution of messages that Alice might send to Bob, seeing the ciphertext would not allow you to update your knowledge.

Tim: 01:01:47 I think it is slightly more general, but it’s equivalent. In my book, I have this theorem showing that two messages is equivalent to any uniform distribution. And you can also generalize the same proof to show that it’s also equivalent to any distribution. So you can prove that the special case that if an encryption scheme satisfies the special case that I put below, but for every pair of messages. Then it has to satisfy also the condition that I put above.

Boaz: 01:02:12 Yes, you can prove that the special case that if an encryption scheme satisfies the special case that I put below, but for every pair of messages. Then it has to satisfy also the condition that I put above. And I think it actually might not be useful to go over the full formal proof, but I can give you the intuition why this is true. And the intuition is the following. If there is any difference at all, so maybe the way to think about it is, so basically for every fixed x, you look at the distribution on that is induced by basically taking, encrypting X with a random key. So this is a distribution over cipher text. Now, suppose that they existed two Xs X zero and X one, were the two distributions were not identical to each other. Then basically you could, you could show that, you could get probability better than half as an adversary because, there is some element that is a little bit more likely under one of them than on the other. And if you see that you will guess it came from X zero and otherwise you’d say it came from X one. So basically the condition, the special case condition that, for every pair of messages, you cannot guess better than half implies that for every message X. For every message X zero and X one, the distribution induced by X. And the distribution, the distribution induced by x0 and the distribution induced by x1 is identical. And if the distribution on y is identical, irrespective of what x was, then basically Eve doesn’t learn anything. That’s right. Yeah. So this basically corresponds to Eve basically learning nothing about the, about the message, given the cipher text that you didn’t know before.

Tim: 01:04:35 I, as I was thinking about this, I really liked this way of thinking. the analogy I have in my mind is basically no matter how biased or how idiosyncratic or how quirky you are with choosing your password, I don’t gain anything more. Then knowing that quirkiness. So for example, if I knew your password was like a food item and a number, like pizza seven, okay, so that definitely reduces the space of all possible passwords to, you know, an order of like, I don’t know, let’s say like a hundred thousand. Cause there’s like, like roughly 10, 000 words in the English language. Some of them are going to be food and then times a digit, which is like 10 possibilities. Right. And I could list all the combinations of food and number. And then even if knowing that. Your encryption scheme, I can’t learn anything more from that than still that you chose some, some, you know, food or, or number, right? As opposed to, you know, if somehow your encryption scheme was such that when I, you know, say your encryption scheme returned, a string of letters, if I knew, for example, that, Chinese food was encrypted with more ease than Italian food than if I saw more ease. That by guessing that your, your password had a Chinese food in it, I would, I would, have a higher probability than just chance, right?
Boaz: 01:05:56 There’s an interesting story about the Enigma machine. The Enigma machine is essentially a set of permutations on letters. Every time you type a letter, the encryption of that would be a different letter and the permutation would change. However, it had the property that it would never map a letter to itself. This might seem like it makes it more secure, but in reality, it’s a bias. In March 1941, Mavis Batey, a crypto analyst at Bletchley Park, received a very long message and noticed that the message did not have any single ‘L’. She realized that this couldn’t happen by chance. The only way that could happen is if the plaintext was all ‘L’s. She deduced that this was probably a bored operator or someone testing the machine and just sent out a message that was all ‘L’s.

Tim: 01:07:35 I wouldn’t have guessed that because it requires the further assumption that the permutation always changes.

Boaz: 01:07:49 They knew that, right? They knew a lot of properties about the Enigma machine.

Tim: 01:07:54 Oh, so they already knew that one.

Boaz: 01:07:56 They knew that property. They knew the property that the machine never outputs the same letter itself. So then she figured out that if the whole message is so long and there is no ‘L’, it has to be the case that there were tons of ‘L’s in the beginning. She realized that probably a bored operator just typed ‘L’ repeatedly to test the machine. Because of some other weaknesses of the Enigma, once you knew the plaintext, it could really help you learn the daily key. This helped to learn about a planned attack by the Italians and helped the British achieve a major maritime victory.

Tim: 01:08:50 That’s amazing. One thing I noticed about this definition, and I’m curious which inspired what, but it looks very similar to the notion of differential privacy. Differential privacy says if you remove or add an element, you gain a negligible amount of information. It’s not quite the same, but it’s very similar.

Boaz: 01:09:23 Definitely. The people that invented differential privacy are cryptographers. They come from this field and were very inspired by cryptography. Cynthia Dwork, my colleague at Harvard, did very strong work in cryptography and was very much inspired by cryptography when they came up with this privacy definition.

Tim: 01:09:53 I’m glad you brought up this Enigma example. At first, it seems like an artificial example. But the point is that if you have any edge, that can be fatal because you could somehow bootstrap and magnify it if you amplify it in the correct way.

Boaz: 01:10:17 Yes. Especially in today’s world where there’s lots of communication and data, even a minor edge could eventually accumulate and be fatal. You really want to make sure that you have no edge or at least the edge is so exponentially small that it’s completely negligible.

Tim: 01:10:54 Sure. Maybe that’s a good point to get into. You mentioned already earlier that there’s a computational caveat to all of this. So I guess there’s a computational modification to this, right?

Boaz: 01:11:10 Right. At this point, this definition can hold and we can show that there are encryption schemes that satisfy this and they don’t need to make any computational assumptions. We can talk about the one-time pad that satisfies this definition. It is possible to satisfy this definition as long as you’re willing to allow the secret key to be as large as the message.

Tim: 01:11:44 Sure. Maybe we could just look at that example because this is quite an abstract definition.

Boaz: 01:11:51 We can do the one-time pad. This goes back to the early 1920s, it’s also called the Vernam cipher. It was invented by Vernam in 1917. Shannon was the first one that really gave a formal definition of security and showed that the one-time pad satisfies this. The easiest way to think of the one-time pad is the very simple case where our message is only one bit and the key is only one bit also.

Tim: 01:12:47 Basically, I’m trying to guess your coin, you flip a coin and I’m trying to guess whether it’s heads or tails.

Boaz: 01:12:51 Right. And then encryption in this case, like encryption of K with X, is simply the XOR of K and X. So let’s write the truth table of the XOR function. If you have A and B, and A XOR B. Another way to think about it is that A XOR B is like A plus B modulo two. So if there is a single one, then it’s one, and if there are two zeros or two ones, then it’s zero. And you now let’s define encryptions with KK of X as X XOR K and encryption with KK of Y as also Y XOR K. Now, if we want to make sure that if we take the encryption of K of X, and then we decrypt it, that would equal to X XOR K XOR K. Because this is really addition modulo two, we have the same properties of associativity. So it’s like X XOR X of X XOR K XOR K. And now you can see that this is zero and X XOR zero is always X. So this is a valid encryption scheme. And now we can make a claim that the one-time pad is perfectly secret. For this one-bit version, we can just basically prove it directly here. There are only two messages. So we basically just want to look at the probability distribution that is induced. The message is either zero or one. And so this is the plaintext space and the ciphertext space is also zero one. And then if the key is zero, then zero is encrypted as zero and one is encrypted as one. And if the key is one, then zero is encrypted as one and one is encrypted as zero. And what we can see is that the probability distribution that if we look at the encryption of K of zero, then this is just the uniform distribution on zero one, where K is random. And this is the same as the encryption of one. Whoever the plaintext is zero one. For a random key, we get the uniform distribution. So you can see that basically from the point of view of an observer, she learns nothing. If the key is random and you observe why then you learn nothing. Now, if you were to encrypt a second bit using the same key, then this is very different. So let’s say suppose we use a one and then we have a two-bit message. So let’s say the message was either zero, zero, or maybe zero one. Then it’ll get mapped to zero XOR K, zero XOR K, or zero XOR K one XOR K. And this is basically what get mapped to KK. In this case, it would make KK to the negation of K. So basically if by seeing if the first bit is the same as the second bit, she can learn if the message was zero, zero or zero one. Indeed. So if you use the same, if you reuse the same key for a longer message, then you break all of the security. And this is why I like to say that the one-time pad is basically it’s like the name which is like point away from your gun because it tells you never never use the key more than once like every bit of the key should only be used for one bit of the message and you cannot reuse it.

Comment

Tim: 01:18:45 Sorry, what was the analogy you used? Point away from your gun, like, Oh, don’t shoot yourself. Basically.

Boaz: 01:18:53 Like using, like it’s in the name. It tells you just use it one time. And now it might look at everything. It might look you like a toy. Like basically, we are talking about encrypting one beats, but you can generalize this. Basically, by just, you can define the one-time pad. You can just use it if you want to encrypt longer messages. You can just have the key be K1 till KN. And the message be X one till X n and you can just encrypt a the encryption of a with the key K one till K n of the message X one till X n is just a you, you, you XOR each beats a. Each bit on its own.

Tim: 01:19:56 So it’s just the end copies of the one-time pad.
Boaz: 01:20:00 Exactly, end copies. But as long as you don’t, you keep the one-time pad, use it only one time. But the problem is that it becomes very cumbersome to have, like think of the modern world. We are encrypting files that are maybe gigabytes or terabytes. We’ll need to exchange ahead of time, gigabytes or terabytes of keys. And keep them in a secure way that they’re not going to get leaked.

In the 1940s and 50s, the Soviets were using one-time pad for communication. And the way they were sharing these keys is by code books. The logistics of exchanging so many code books were very complicated. So they started reusing the key. They were thinking, okay, we’ll use it maybe for some different KGB stations in different countries or something like that, and maybe it will all turn out. But then, US intelligence figured this out, that they’re reusing the keys and they started this project. In my book, I mentioned Gene Gabi was one of the founders of this project. They managed to break a lot of Soviet communication because they didn’t follow the instruction on the box and they used the one-time pad more than once. Many KGB agents were exposed because of this project.

Tim: 01:21:43 Let me clarify a bit. So you said only use the one-time pad once. So I interpret that to mean that the K1 through KN here are all chosen independently. Right. You don’t want to duplicate any of those. Although I think maybe I misunderstood what you said. You can use the key more than once to encrypt as many messages as you want, right?

Boaz: 01:22:12 No, you don’t. Once you’ve used K one, you can never use it again.

Tim: 01:22:17 What I meant is that you have K one through K N once you had that tuple that’s well selected. That can be your key forever, right?

Boaz: 01:22:24 No, you can transmit n bits. You can transmit a message of length n. And that’s it.

Tim: 01:22:33 Oh, so you have to regenerate a key for every new message.

Boaz: 01:22:39 Yes, because if you think of the example that I said before, like say, you have this key of length n, and you repeat it for two distinct messages. And suppose that your adversary knows that there is some probability that the message, you’re sending the same message twice, or you’re sending a different message. Then the adversary could figure out by looking at whether you’re sending two identical blocks or two different blocks to figure out which one of those was the case. So the adversary learned something from the ciphertext, which should never happen.

Tim: 01:23:22 Wait, I’m a little confused now because I thought the point of perfect secrecy was that you don’t learn anything new. So you pick the key and you have two distinct messages. It doesn’t matter whether you sent it yesterday or tomorrow.

Boaz: 01:23:34 You can only use the one-time pad as long as you only encrypt one message with it, which is the length at most the length of the key or if it’s a sum of messages, you encrypt them with different pieces of the key. But once you run out of pad, you run out of pad. You never use it again. So the definition of perfect secrecy only says no matter which message you choose. But it’s only one message. You cannot use it again. We have other definitions in cryptography that talks about multiple message security, but they are more complex.

Tim: 01:24:16 Oh, I see. So you’re saying there is a, basically you’re saying you’re only allowed to sample the message once, whatever your prior distribution is. And as soon as you sample it a second time, you better generate a new key. Yes. For the one-time padding. I see. Now. Okay. I didn’t realize that, but then there must be an extension of this where you like, because for example, I, as far as I know, my keys aren’t updating all the time. I sort of a key once for all time.

Boaz: 01:24:47 That’s why we need the cipher conjecture. This is basically why all of cryptography relies on P difference and then P, because one-time pad is extremely restricted. You have to generate a new key for every message. So it’s extremely restricted. And that’s why when the caregiver used it, they eventually did reuse the key. That’s why it’s so tempting to reuse the key and then all of your security guarantees go out the window. This is exactly what the KGB did. They reused the keys and then hundreds of KGB agents got caught because of that. And then, there is an impossibility result. There is a theorem that says that if you want perfect secrecy without making any computational assumptions, then this is a necessary condition. Then the one-time pad is optimal. You cannot reuse keys and the total size of your messages, you can also generalize it to basically the total entropy of the messages needs to be smaller or equal than the total entropy of the key.

Tim: 01:26:06 Okay. I think I see what’s going on now. And it’s obviously clear for this very explicit example. The problem with sampling multiple times is that essentially, you can figure out the key coordinates in this case, just from the statistics of each entry, right?

Boaz: 01:26:29 So, exactly because the messages itself, like we said, there is some known prior distribution on it. So suppose you knew that the messages are 60% likely to be zeros and 40% likely to be one. Once you get multiple times that you’re using coordinate one in the key, then you’re basically getting a lot of statistics because you know that 60% of the time this is zero. So 60% of the time you’re going to see some bias towards the actual value of this coordinate. That’s right. So this is why one-time pad is such a pain and it’s not really used because you’re right. This is not what we do now is, you know, we exchange a key that is maybe 256 bits and we are using it to encrypt many, many messages. And because of that, we really have to move to a computational definition.

Tim: 01:27:31 I see. Okay. Yeah, let’s see. So, I guess the next thing is to do the computational modification of this.

Boaz: 01:27:47 Yes, the one-time pad is not practical because it really needs a fresh key for every message. And the total number of length of keys is as big as the message. We could hope to improve it, but then there is this impossibility theorem that Shannon also proved that there does not exist a perfectly secret version of encryption with a key pen and a message length L, which is more than N plus one, let’s say, or generally larger.

Tim: 01:28:53 Shannon’s most famous work is in information theory in terms of introducing Shannon entropy and things like that. This work in cryptography was that after, before, around the same time? Do you know?

Boaz: 01:29:04 It’s around the same time. And some of it, you can kind of phrase these things also as a. It’s kind of related. So you can phrase these things also as some entropy conditions. So you can also give a kind of more general statement that the entropy of the key must be at least as large as the entropy of the message.

Tim: 01:29:29 Ah, okay. Okay. All right. Okay. Great. All right.

Boaz: 01:29:32 Yes. And in some sense, if you know information theory, then you can prove this with information theory. By basically saying there must be some non-zero mutual information between the ciphertext and the messages. Ah, I see. Okay. So you can use information theory. It will be something like, I think this is basically what we could say is right. So we look at the mutual information between why the ciphertext and the, and, and, and the, say X, the message, right? And that basically, there are like three sources of randomness. There are say two sources of randomness here, the message and the key. Right. And we know that, because you can decode, we know that this has to be at least the entropy of X. Okay? Because, because you can decrypt, right? So y has to contain all the information about x. And the only source of randomness after you condition on X is the key. So this has to be at most the entropy of the key. Okay. So this has to be a, so if the entropy of the key is less than the entropy of X, then this is positive. Ah, so there is mutual information between the ciphertext and the message. Which breaks the notion of secrecy.

Tim: 01:32:31 I see. Oh, nice. Okay. Beautiful. All right.
Boaz: 01:32:37 Yes, this is essentially the information theoretic proof of Shannon’s theorem. This kind of impossibility result suggests that the notion of perfect secrecy might be too strong. To bypass this, we move to the computational definition. We could potentially have an encryption with a key smaller than a message, but as long as an adversary would need to expend a significant amount of computation to gain an advantage, we can achieve computational secrecy.

Perfect secrecy is defined for every pair of messages X0 and X1. For every function from 0 to 1, the probability that if we choose B at random from 0 to 1, if we choose the key at random from 0 to 1 to the N, that Eve, let’s call it Eve, if the encryption eve of e k of x b, the probability that is at most half. I wrote you the definition. I wrote it slightly differently, but it’s in an equivalent way. So basically, I’m thinking of the game as you choose, there are two messages, x0, x1. You choose a B to be at random either 0 or 1. You give Eve an encryption of either x0 or x1, and then Eve needs to guess whether it’s 0 or 1. This is the definition of perfect secrecy. And now we want to change to a definition of computational secrecy.

The definition of computational secrecy is basically the following thing. It’s computational. We’re saying not for every function, but for every efficient function, and now you get into computation complexity. What does it mean to be efficient? And let’s just say for the purposes of this podcast is that, say this is at most, two to the n over, I don’t know, ten, Boolean operations. So if is some, can be implemented by some circuit, with say, and, or, and not gates. And the, at most, say two to the n over 10, operations or something like that.

Tim: 01:36:23 So n is the key parameter, right?

Boaz: 01:36:28 Yes. No, the 10, like just may, let’s say two to the, I don’t know.

Tim: 01:36:35 But this is still somehow exponential in n. I thought I’d see some polynomial condition. I’m a little confused.

Boaz: 01:36:42 In theory, there is a difference between theoretical cryptography and practical cryptography. For this podcast, I’m thinking more of the practical version. In the practical version, when you’re saying something like, your key gives you 128 bits of security. You really want two to the 128 operations because polynomial is not exactly well defined. What do you mean? Like 128 to what power? 3, 5, 10. So you kind of want to make things concrete.

Tim: 01:37:23 I see. So you want something to supersede all polynomial complexity, but still be sub exponential. With regard to the two to the, like the true exponential, right?

Boaz: 01:37:35 Maybe, you would have loved if you could get better than two to the n, but you can very easily show by brute force more than two to the n you cannot hope for, because you can always do brute force attack and enumerate over all possible keys. So you would like to get as close to that as possible. So the n over two here is just kind of. right. I put your N over two. You could have put your 0.9 N or something like that, but it’s some kind of.

Tim: 01:38:02 Right. Basically I just, you just want some function such that when you divide it by two to the n it goes to zero As, as n goes to infinity, something like that.

Boaz: 01:38:12 You want it to grow very fast to go, maybe in particular, but yeah, that if n was 2 56, then it would be big enough that you don’t have to worry about it as, as the user of the encryption.

Tim: 01:38:25 Yes, I see. Yeah, you want it as large as possible without being too.

Boaz: 01:38:30 Without the definition being impossible to achieve.

Tim: 01:38:35 Yeah, that’s right. I see. Oh, I see. This is more about practicality than any notions of negligible. It’s really more about, I just want something that grows very large, but not too large. So it’s not two to the end.

Boaz: 01:38:47 Yes. And it turns out that you also have to add like some kind of, again, just put some number there, but sure. So you need to allow for the, there is always an exponential chance, there are always two types of attacks. One is that you run in time two to the end to just enumerate over all possible keys, or you just guess a key and you have like some two to the minus and advantage over the trivial. So we need to allow for both of those things. But it turns out that under some computation complexity assumptions, once we allow for both of those things, then, as far as we know, it is possible to achieve an encryption where the key is much, much shorter than.

Tim: 01:39:45 Yes. I see. So basically it just like, just to reiterate what you, what you just wrote here. So this is, you could think of this as like the edge you would get from just brute force or, or lucky guessing, right?

Boaz: 01:39:59 Yes. Maybe this is lucky guessing like the, the, the, the, the, this is lucky guessing. And the thing on the left is the brute force.

Tim: 01:40:07 Okay. Sure. So this, this is, that one.

Boaz: 01:40:13 Right. So if we know that if we had no computational constraints, then we could never hope to achieve an encryption scheme with a key shorter than the message. And you can show that basically if the key is shorter than the message, then you can always get some advantage by lucky guessing and or a very big advantage by brute force.

Tim: 01:40:43 Then, yeah. And the name of the game is that you allow this relaxation to get around this impossibility theorem, but where you win is because, for sufficiently large N it becomes infeasible to break the encryption system.

Boaz: 01:41:04 And here is also like the magic of exponentials that for kind of moderate N, so you could have a key that is just in the hundreds of bits and you can get something that is secure that as far as we know, even nation states cannot break.

Tim: 01:41:19 Okay, right, right, right. Well, conditional on P versus NP.

Boaz: 01:41:27 Yes, conditional on conjectures that are even stronger than P different from NP. So maybe we should try to say what’s the goal. So this is a relaxation of perfect secrecy. But this relaxation turns out that under assumptions it buys us a lot. And it also kind of introduces us to the way things always are done in cryptography, where there is always a security parameter, which is the length of the key. And security is always quantified as a function that improves with the length of the key. And generally there is a trade off where basically the bigger the key, the more expensive it is for the honest users to encrypt and decrypt. But also the more expensive the more infeasibilities for the adversary to break the encryption. But the beauty of exponentials is that the gap is huge. So you could have potentially an encryption scheme where the honest users can do it in microseconds, but the adversary would require more than the age of the universe to break. So we had the definition of perfect secrecy. So this was a definition. We have a construction, a one time pad, we have a problem that the key, the length of the key needs to be larger than the message. And another way to say it is also that we cannot really reuse the same key for many messages. And this is a big problem for using it. And we have a theorem that basically says the problem is inherent, that you cannot avoid this problem as long as you have the definition of perfect secrecy. So now we move to the computational domain. So we have a relaxation of the definition, which is computational secrecy, and then we’ll see that we there is a construction which we can think of as a de randomized one time pad, and in practice, it’s also also known as just a stream cipher, we can see that before this construction, we can have the key to be much, much smaller than the message. And generally, in practice, keys can be on, like, on the order of 128, 256 bits, and the message could be in the gigabytes or terabytes. And we have a, and there can be a theorem that if there exists secure, pseudo random generators, then, you can have a secure, de randomized one time pad. So this, by moving to the computational secrecy world, we need to, if we are willing to live with the computational secrecy, which is really as good for all practical purposes, because no one can even nation states cannot do two to 128, in operations. And even like, let’s say the number of flops in training, GPT four is, I think no, no one knows exactly, but it’s at most, 10 to the 25, which is less than, I guess, I don’t know, two to the 70 or something like that. So, two to the 128 is way beyond anything that Moore’s law will ever get us. And then not to mention two to the 256. And so, computational secrecy is not really a problem. And now the, the thing that we, so we’re, so giving up on the on perfect secrecy to computational, computational secrecy is not really an issue. The other thing that we give up on is on, Shannon’s theorem was unconditional, the one time part is perfectly secret, period. Here, we’ll have to make an assumption. We’ll have to make a computational assumption that pseudo random generators exist. And that’s life because, we need to make at least the, the conjecture that, as you mentioned, that P is different than NP, and for all of this to work.

Tim: 01:46:36 Yeah, maybe just, let’s unpack this a bit. So, you know, the writing the precise definition of a pseudorandom generators is a little bit involved, but it’s easier to kind of say in words intuitively. basically, what a pseudorandom generator is doing is it’s generating numbers that are computationally indistinguishable from a random.
Boaz: 01:46:55 I’ll explain the concept of a pseudorandom generator using a visual representation. A pseudorandom generator is a function that maps 01 to the end to 01 to the L, where L is larger than n. We refer to the input as a seed, and the output is just the output. If we visualize a pseudorandom generator, we can think of it as G, which takes a random seed S, or maybe we’ll call it K, and then there is this output, G of K, of length L.

The security definition is that if we have a distinguishing algorithm, with a 50% chance it gets G of K for a random K and a 50% chance it gets a completely random Z from 01 to the L. Then, it cannot guess which is the case with a probability better than half plus something negligible or exponentially small.

We need to assume that the distinguishing algorithm has less than, say, two to the over 10 operations or some number. So, the definition of a pseudorandom generator is that you output something that for every circuit that runs in less than exponential size, they cannot get better than an inverse exponential advantage in guessing whether the output was completely uniform or pseudorandom.

Tim: 01:49:34 It’s surprising that this definition works because the target space is larger than the source space. You’re trying to have an injective map. The uniform distribution on L bit strings would be supported on all those strings, but you can’t achieve that using the map G.

Boaz: 01:50:01 Indeed, you have to make a computational assumption. The space of all possible outputs, 01 to the L, and the space of the outputs of the pseudorandom generator, g of 01 to the n, is a tiny fraction of the space. But the point is that it’s generally distributed all over the place.

Tim: 01:50:47 Right. It’s like this dust that’s uniformly everywhere. It’s as uniform as it can be.

Boaz: 01:50:53 The idea is that computationally simple classifiers like this D, the fraction of the overall space that D outputs one on, in the fraction of the blue points, these G of 01 to the end points that D outputs one will be the same.

Tim: 01:51:20 Basically, D doesn’t have enough resolution so that even though you have this dust, it’s spread about so uniformly that D just doesn’t have the resolution to distinguish between that dust and really sampling everything.

Boaz: 01:51:32 Yes, that’s a nice way to think about it. Statistically, these are very far. The pseudorandom output and the uniform output are very different. But for computationally bounded observers, they look the same.

Tim: 01:51:55 Okay, great. It’s surprising that you could actually satisfy this property.

Boaz: 01:52:16 This is indeed why we don’t know how to prove that P is different from NP.

Tim: 01:52:30 Is this equivalent to NP complete, NP hard?

Boaz: 01:52:39 I think the right way to think about it’s actually NP easy in the following sense. If you had an algorithm for solving if P equals NP, then every problem for which you can certify a solution, you can also decide it. So, given a point Z in 01 to the L, you can certify that it is in the support of the pseudorandom generator by giving the K such that Z equals G to the K. So, if P equals NP, then there does not exist pseudorandom generators.

Tim: 01:53:47 Okay. I was trying to figure out if you can prove this PRG conjecture, could you say something about P versus NP?

Boaz: 01:53:57 Just the contrapositive of what I just wrote. So if P equals NP and they do not exist PRG, so therefore the contrapositive, if they do exist, then P is not equal to NP.

Tim: 01:54:07 I was just trying to figure out to what extent is there an equivalence here?

Boaz: 01:54:19 There is a wonderful survey by Russell Impagliazzo that he wrote in 1995 where he constructed five different worlds of the status of computation complexity problems.

Tim: 01:54:36 Ah, this is the source of your recent blog post with Scott.

Boaz: 01:54:40 Yes, these are increasingly stronger conjectures. You can have one world where P equals NP. But you could also have this world where P is different from NP, but all average case problems are easy. And in this world, there would be no pseudorandom generators. And then the world where pseudorandom generators exist, he called it MiniCrypt. And then you could still have that maybe pseudorandom generators exist and public key cryptography doesn’t exist. He gave the name for the world where public key cryptography exists, Cryptomania.

Tim: 01:55:30 I see. I’ll need to look at this paper.

Boaz: 01:55:36 There was a somewhat recent Quanta article about these five worlds.

Tim: 01:55:47 Great. So we have this PRG and the point is, this PRG enables one to have the computational secure encryption scheme. It sort of allows you to generalize the one-time pad idea.

Boaz: 01:56:03 Exactly. So if we have a key, K, we can feed it into a PRG G and we get this Z, which is G of K. And then if you have a message X, we can take this message X, XOR it and now this will be the encryption. So the encryption with K of X will be G of K XOR X instead of just K XOR X.

In practice, they have these pseudorandom generators that can give you pieces of the output. So rather than thinking about it as if you get the seed and then you get the huge output, you rather think that you get the seed and then you can generate parts of the output on demand as much as you want. And so you basically have an endless stream of randomness. And whenever you want to send a message, you take the unused part of the stream and send to the receiver, which is why in practice, they call it a stream cipher.

Tim: 01:58:04 Ah, so this solves the problem of reusing keys, right? Because you’re basically updating your seed every time you go back and forth.

Boaz: 01:58:11 So you can think of it as updating your seed or you can think of it as never reusing the output. Every time you take a fresh part of the output. Now in practice, the way they typically do it is you have an initial seed. And every time you want to generate an output, you generate two things. One is the output and the other is an update to the seed. And now you can basically use it as much as you want.

Tim: 01:58:43 Yeah. And this is pretty amazing because if you think about it, it’s a compression result because basically you’ve compressed this really long string that you’re doing the one-time padding into this very small key.
Boaz: 01:59:02 Yes. Sometimes one way to think about it is this content of the theorem that we have in the notes. Once you’ve used computation to evade the laws of statistics, like generating a distribution of entropy N minus one that behaves as if it has the entropy N, you can reuse it and break the laws of statistics even more. Suddenly, you can take a 256-bit long key and use it to generate pads of length of gigabytes that behave as if they’re uniform, even though they have almost no entropy compared to the length, but it’s still indistinguishable.

Tim: 01:59:50 Okay. This is, we can go into many more details, but maybe in the interest of more breadth instead of depth. I think this pretty much summarizes the key highlights of private key cryptography. We defined what a valid encryption scheme was. We eventually found a good definition, I guess, due to Shannon. And now we’ve validated that this whole set of works provided this PRG conjecture holds, which also assumes P versus P is not equal to NP, correct? Yeah, maybe what’s next then is public key cryptography, right?

Boaz: 02:00:36 The main problem with private key cryptography is that you still somehow need to share the key. I don’t know about you, but I’ve never been to Amazon headquarters to go and exchange a key with Amazon. The notion of exchanging a key might work for spies that get the key when they go off to the mission, but it doesn’t really work on the internet today, where we often want to communicate with someone that we’ve never met in person before. We don’t have this out of band channel of communication to exchange the key. It was kind of obvious to everyone that you cannot have secrecy if you didn’t exchange a key until a few people started to think about it harder and realized that there isn’t actually a mathematical proof that this is inherent, that this is impossible. Several different people had the same thoughts. James Ellis in British intelligence agency GCHQ thought about this and realized there isn’t any impossibility result. Diffie and Hellman thought about this and Ralph Merkel, who was an undergraduate, also thought about this. He wrote a project proposal with this idea and his professor said, no, this is obviously nonsense. Try to do something else. The notion of public key seemed like a contradiction in terms, the whole notion of a key is that it’s secret. How can it be public? So let’s try to see what does it mean to be a public key encryption. We have again, Alice and Bob, Alice has a message X that she wants to send to Bob. People think it’s kind of obvious that the only way she can send the message to Bob is they have to share the secret key. But in public encryption, we’d say, okay, actually, Alice is going to use the encryption algorithm. Bob is going to use the decryption algorithm. What they thought is what if the encryption algorithm and the decryption algorithm don’t actually need to use the same key? So the idea is that Bob can generate a pair of keys, E and D. E is the encryption key, and D is the decryption key. And now, Bob can send to Alice the encryption key E. Eve also will learn E.

Tim: 02:04:09 So yeah. This is the public key by definition. ’cause that’s the one

Boaz: 02:04:12 you’re selling. Yes. E is the public key because you are sending it in the clear. Eve also sees that. And now, Alice is going to send an encryption with the key E and Bob will decrypt with the key D. So now the condition is that, for every X, if you decrypt what you encrypted, then you get back x. And now the security condition is basically the same as before, you still want computational security, we still want that for every pair of messages, Eve cannot distinguish if the message was X0 or X1, where now the twist is that Eve cannot distinguish even when she’s given the public key, because it is public.

Tim: 02:05:23 Right, right, yeah, and D, of course, is therefore the private key. Yes. Right. And just to state this in words, basically this allows anyone to communicate to Bob, but only Bob can read those messages. Exactly. One of the nice things about your book is you give a lot of intuition and colorful commentary. I didn’t quite catch the intuition behind why people thought this was possible. Cause when you look at this, it’s just not at all obvious why this should be true. It’s like a door that works with two different keys or something, which is like, I don’t, I don’t know if outdoors, but that doesn’t seem, I don’t, I’ve never seen a door that works that way. Right. So, can you give some intuition, like why you should even think this is possible?

Boaz: 02:06:06 Right. So I think that, first of all, intuitively it should be impossible. I think part of it was people trying to first think about it and then try to prove that it’s impossible and then realize that they cannot prove that it’s impossible. And then maybe they reshaped their intuition and started adjusting their intuition because it’s definitely not something that people guessed. It’s not like a heavier than air flight or going to the moon. It’s not something that people said, here is a dream that we should do, and we just don’t know how to do it. Like most people just thought it’s obviously impossible. It’s like, so there is no intuition why it should be possible. Once you start thinking about it, maybe the way to think about it is one of those locks that you click shut. So there are these locks that you don’t need the key to lock them. You only need the key to open them.

Tim: 02:07:24 Ah, yeah. Oh, you know, actually it’s a great example. It’s sort of like the post office drop box, those blue boxes outside, like you open the hatch, you put your box in and you close it and you can’t open it again because it’s too far down and only the male person can open it.

Boaz: 02:07:42 Yes. Exactly. So there are like these kind of things where one way, where like there is a trap to do it in. So there is an easy way to do something in one direction, but to do it in the other direction, you need something else. Like, so you need maybe the secret, the key of the male person, or if you have these locks that you push to click to shut, then yeah. So once you, once you think about it, you see, okay, the encryption and decryption, they don’t have to be symmetric. It doesn’t have to be the

Tim: 02:08:13 same operation. That makes a lot of sense now because we’ve already, the whole name of the game with P versus NP is it’s hard to do things one way and easy the other. Right. And the easy should be the encryption and the hard should be the decryption. So phrase that way now it’s like more plausible.

Boaz: 02:08:27 Yes. So I think eventually like you, the first intuition is that maybe it’s impossible, but then you shape your intuition and eventually you find a different metaphor and you figure out, Oh, maybe it is actually possible, but it is much more subtle than private key cryptography in particular. There are like a bunch of constructions for pseudo random generators and in general, they kind of look like you just put a bunch of logical operations on top of it and you put enough of them and you scramble things enough and eventually, you get something that’s probably secure to the random generator, unless you kind of made some mistake along the way, but with public key, you really do need to use some math and there have been very few actual constructions to achieve this idea.

Tim: 02:09:25 At least the simplest version of an instantiation of public key cryptography. You know, it’s pretty basic number theory has to do with exponentiation.

Boaz: 02:09:37 Maybe I’ll just say like at the high level, what people have used. So generally there have been two main themes, like there has been a number theory construction, which use either integer factorization or a discreet log. So basically like every other encryption, you still need some kind of problem that is computationally hard. So integer factorization, there is this problem that you take two primes p and q and you map them to n, which is p times q. So it’s easy to do this side, but reversing could be hard. And again, we have to be precise what is meaning it hard. Like we all actually learned how to factorize numbers in grade school, but the algorithms that we’ve learned. It’s basically scales. So if you have a number and it’s kind of you, you do a for loop that goes all the way to n or maybe if you’re a little bit clever, square root n. But if n is a thousand bits, then square root n is like 2 to the 500. And that’s not something that you can do. And there are better algorithms, but generally we don’t, like the algorithms take time that is exponential, not in the number itself, but in the number of bits to describe number and discreet log is again, like something like you, you take some number G and some number A and you look at g to the a, model P and again, this part is easy, but reversing this is hard. Say, given G and G to the A, we don’t know how to log, and get A back. So these are two problems that have been one source for public encryption. So this is Diffie Hellman and RSA, and there is elliptic curve cryptography, which is basically a version of discrete logarithm. And then the other problems were based on error correcting codes or lattices. And generally, these are things basically of the form, say, you take X and some code A. And you encode this and noise it up, so this part is easy and under some assumptions, over finite fields over certain versions. Decoding could be hard. So given A and AX plus E, when you’re over some finite field, then it could be hard to decode to remove the errors.

Tim: 02:13:07 Okay. Okay. I guess the reason I wanted to gloss over this is because it’d be nice to talk about some concrete applications. We’ve been talking very theoretically, but also you can go and Wikipedia look up the algorithm, you can write some formulas based on arithmetic. It’s not particularly deep, it’s just Yes. You, you know, you can see it. Right. So, so, but, but yeah.
Boaz: 02:13:28 The thing I should mention, especially since you talked to Scott, is that if scalable quantum computers get built, the left side is done, broken, and we are only left with the right side.

Tim: 02:13:47 I see. So Shor’s algorithm would nullify the left-hand implementation of public key cryptography, but we’re still left with the right side. So it’s not that much of a security breach, we have a backdoor to this.

Boaz: 02:14:02 This thing on the right is what people call post-quantum cryptography. We’re lucky we have it, but it is a little uncomfortable in the sense that if someone discovers an algorithm, either classical or quant for this decoding problem, then that’s essentially it. We don’t really have other candidates. People have been trying to propose some things, but most of them have been broken. We don’t have a very reliable candidate apart from this.

Tim: 02:14:39 I think it’d be good to take stock of what we’ve accomplished, which is a lot. Let’s talk about some applications. We could just be very cartoonish about it because we’ve gone into a lot of detail. I mentioned a few things and feel free to talk about them with any level of granularity, but Bitcoin is a topic of the day. Maybe any of your thoughts on deep fakes and authenticity, things like that and whatever interests you in machine learning.

Boaz: 02:15:15 Sure. So maybe we can chat about those things and maybe at a higher level. The general idea of Bitcoin is currency, which is decentralized. So there isn’t one authority that says who owns what. Then you need to solve a problem that in computer science is known as the consensus problem. You need to agree on who owns what so you can have some kind of currency or any type of shared data structure. In distributed systems, there have been several solutions to this problem. But generally, the way these solutions worked is they said, okay, we have n agents and as long as say a majority of them are honest, then we can achieve some consensus.

And that’s great, but if you have a decentralized system and everyone can sign up and open a million accounts, how do you decide what is majority? The really nice idea in Bitcoin is they said, you know, what we’ll do is we’ll say a majority of the computational cycles.

So in order to vote, you need to spend some computational cycles. But how do you prove you spent some computation cycles? This is where crypto comes in. You can set up tasks where you have some parameter n such that the effort to break the task is 2 to the n, and typically we set up this n so that it’s so big that no one would ever break it. But in Bitcoin, the idea is rather than set n to be huge, set it exactly at the right level.

So the idea of Bitcoin is that you basically set these computational puzzles where you scale up the n to the right level so that solving this puzzle will be exactly hard enough. And then this is basically how you vote on what is the ground truth or in Bitcoin, what block is added to the blockchain.

Now there are a lot of additional complications and Bitcoin uses signature schemes. Part of this, we didn’t talk about digital signatures, but basically this is your identity in all of these cryptocurrencies. If there are no identities, what does it mean that I am the owner of a certain coin? I am the person that has a secret key that corresponds to a certain coin. And typically it’s a signature key, not an encryption key.

So basically, these cryptocurrencies use cryptography and particularly computational puzzles to get around the paradox that they’re trying to solve, which is how do you do a decentralized mechanism where you need a majority, but a majority of what, and then there are these two variants and both of them use cryptography.

So one variant is a proof of work where basically every cycle gets one vote, but of course it has a lot of problems because you basically have to burn a lot of cycles for no useful reason. And so it’s not good for the environment. It’s also not good for the efficiency of the system.

And the Bitcoin system is famously inefficient. And I forget the numbers, but probably Visa can process probably 10 million times as many transactions per second as the core network or something like that.

Tim: 02:19:57 To make this concrete, how does this concept of proof of work solve the double spending problem?

Boaz: 02:20:05 You reduce double spending to the problem of consensus on a ledger. Let’s imagine that every coin, the smallest units, they call them Satoshis, had its own identity. And there was a ledger that said, Alice owns coin number 7,8,9,5. Then Alice gives Bob coin number 7,8,9,5. Now, if Bob comes to you and wants to pay you with coin number 7,8,9,5, you can go and look at the ledger or the blockchain and say, okay, yes, Bob really does own this coin.

So there is no double spending. You reduce the problem of double spending to the problem of having an online ledger. But everyone, and again, it would be trivial if there was some website that we all trusted and would have this ledger in it. But the whole point of Bitcoin is that there isn’t.

Tim: 02:21:31 So what happens if, say, Alice tries to give Bob coin 7,8,9,5, and Alice tries to give Eve coin 7,8,9,5?

Boaz: 02:21:43 As long as Eve and Bob both agree on the copy of the blockchain, then they can both see who owns this coin. The moment Alice gives it to Bob, the coin hasn’t been received until that transaction has entered into the blockchain and was registered. So it basically is a little bit like the way we sell houses. If you want to sell me an apartment in San Francisco, there is a deed in the San Francisco city office that tells who owns the apartment and to sell it to me, eventually the deed has to be transferred. And then I’m registered as the owner in the San Francisco office. So this works well when there is a centralized office that says who owns what.

And the idea of Bitcoin is we replace the centralized office with a decentralized blockchain. Conceptually you can think that when someone proposes which block to add to the blockchain, then we make an election and we vote on it. Where each computational cycle gives you one vote.

The way it actually works is a little bit more probabilistic, where basically each cycle kind of gives you like a lottery ticket. And the more cycles you have, the better chance you have to win and the winner of the lottery ticket is the one that gets to choose which block to add to the blockchain.

Every time you apply a hash function, it’s like buying a ticket. And if you get an output that starts with many, many zeros. Then you won.

Tim: 02:23:55 I think we said enough about Bitcoin. Why don’t we maybe wrap this up with maybe some interesting machine learning thoughts you might have. I was much more optimistic about the future having learned about cryptography. I was thinking like many other people who might not be well informed, like, you know, especially for me in cryptography, which is that, Oh, machine learning the future.

We’re going to be flooded with deep fakes because look how much better they’re getting and how are we going to trust anything we see. And with the digital signatures, which is a way of giving authenticity through public key cryptography, does that basically solve the issue of deep fakes? Because in a world where everyone has to sign things, you might want to watermark those to know that they’re fake, but there could be a future in which the reverse is true in which you want to authenticate everything which is authentic.

And then everything which is not authentic is fake just because now the ease at which it is to make fake things is so feasible. Does that seem like a reasonable future to you? And is that like the solution you might think of in terms of how do we combat misinformation in the future? Just digitally sign everything.
Boaz: 02:25:19 Cryptography can significantly aid in establishing provenance. We’re already familiar with tools like Photoshop, but as we encounter more and more digital forgeries, provenance becomes increasingly important. It’s similar to the art world. If someone wants to sell you a Rembrandt, you want provenance. You want to know exactly where this painting was and how it got there. You don’t just look at it and say, “Oh, wow, it looks really good” because people have learned the hard way that you can forge these things. This could be the case with images and videos as well. We want provenance. We want to know where this photo was taken and how it got to the internet.

Cryptography can also help with this. There is this whole notion of zero-knowledge proofs, where you might be able to prove that a photo was taken by a certain device, but without revealing the identity of the device. Suppose you’re taking a photo of police violence and the photo is authentic, but you don’t want to sign it under your identity that could be traced to your phone. There could be potential ways in which you could authenticate that it was taken at a certain time and place, without revealing who it was that took it. There could be some very interesting cryptographic solutions to the problem of how do you simultaneously certify some facts about this without giving everything away.

I think cryptography can play a role in that. In fact, it already does in some cases. For example, in Australia, there was a case where judges tossed out some speeding camera tickets because they used the wrong hash function. So, in some sense, speeding cameras sometimes already use some cryptography to authenticate what they took. If we don’t rely on our eyes to tell whether something is authentic or not, then cryptography can help us with provenance.

Tim: 02:28:50 So, once you know enough cryptography, is it reasonable to be optimistic that society won’t collapse because of deep fakes? Assuming P does not equal NP?

Boaz: 02:29:07 Yes, it’s important to note that all of these advances in deep learning have not touched cryptography in terms of breaking it. Deep learning has had zero impact on that as far as we know, and there is no reason to doubt the security of our encryption algorithms based on any advances in deep learning.

Maybe it’ll change in the future, but so far, as far as we know, AI can do a lot of things that humans can do, but humans cannot break encryption schemes and neither can deep nets. Implementing new cryptography is never simple, but I do think that it could play a big role. If we move to the mode where everyone is suspicious about some text, images, or videos that appear on the internet if they don’t have the provenance for it, then cryptography can really help with certifying provenance.

Tim: 02:30:31 That’s great. So, it sounds like the real thing you should be scared about if you’re a doomsday person is whether P is equal to NP or not, right? That’s the real scare. If P is equal to NP, I feel like that’s the doomsday scenario.

Boaz: 02:30:49 Yes, if P equals NP and someone bad discovers that, whether that someone bad is an out of control AI or a dictator, that’s very bad because a lot of our systems’ security relies on cryptography. Just think about over-the-air updates. Suppose that you could forge the digital signature of Apple and send people over-the-air updates to their iPhones that would make it into a listening device and send all of the information there to someone else. You could wreak such havoc.

Tim: 02:31:55 People are impressed about AI being human-level intelligent or mimicking humans. This security issue has nothing to do with that. It’s just like, I can’t even authenticate anything or do any transactions securely. This is nothing about human-level intelligence, right? So it’s just even more basic. Do you have any final thoughts or anything else you wanted to talk about that might be interesting?

Boaz: 02:32:19 If people are interested in the history of crypto, there are some great resources. For the older history, there’s “The Codebreakers”. For something a bit newer, there’s “Crypto” by Steven Levy. For more technical stuff, there’s my book and a nice book by Katz and Lindell. There are also some nice Coursera courses by Dan Boneh. So, there’s lots of room to learn more for people that want that.

Tim: 02:33:17 Okay, great. Well, thank you so much, Boaz. This was really fun. And thank you so much for your time. It was great learning so much about this very exciting subject with you.

Boaz: 02:33:27 Thank you. It was fun.