## Cryptography course projects

This spring I taught Cryptography at Harvard (as usual my lecture notes are online ).

Teaching it was a great fun because of the fantastic group of students that took the course. They were genuinely interested in cryptography, and kept asking me extremely interesting questions and had excellent insights. Thanks also to Yael Kalai and Bruce Schneier for giving guest lectures in this course.

Though I haven’t done so in a while, I decided to do course projects this term. I was very impressed on what these students (most of whom are undergraduates) managed to achieve in one term. So, since a blog is not much use if you can’t show off your students, without further ado here are the projects in the course:

- A Cryptanalysis of IOTA’s Curl Hash Function / Michael Colavita and Garrett Tanzer. The IOTA crypto currency initially used a tailor made hash function known as “Curl” or “Curl-P”. Unfortunately, a collision was found by Heilman et al. However, Heilman did not make their attack code available. In this project, Colavita and Tanzer managed to reproduce their result, and provide a complete description, as well as online available code for the collision-finding algorithm. Even if you are not interested in IOTA, reading this project is a great way to get familiar with some cryptanalytic techniques. Like any good cryptanalytic work, they even got flamed on Reddit.

- DeCERT: A Decentralized Certificate Authority / Leo Hentschker. This project describes what is a novel (to me at least) approach to deal with the issue of trust in certificates: decentralize the decision of what certificate to trust by voting on a block chain. Leo built the website http://decert.io/ where one can issue a certificate as well as voting on it using tokens based on the Ethereum currency. This was much more than I could have expected from someone to do as a course project!

- OpAwesome: The Good, the Bad, and the Fuzzy in the Secure Database Landscape: Sophie Hilgard and Wilson Qin. The project does a thorough exploration of the design space for encrypted data bases. It seems that every few months we hear of another break where millions of people’s personal information was compromised, and a theoretical cryptographer might wonder why don’t people simply encrypt all the databases with fully homomorphic encryption? Of course, even I know that this is not yet realistic, and generally there are tradeoffs between the levels of security and efficiency (and not less important, backward compatibility with legacy systems). The project covers the various solutions in this space, including some that are relatively easy to use as a “plug in replacement” for non encrypted data bases, but whose security is also limited, and others that have stronger notions of security. They also discuss their preliminary work on
`opAwsome`

: a secure database that allows for smoother tradeoffs between communication and client-side computation on the one hand, and security on the other hand.

- Fully Homomorphic Image Processing in the Cloud: William Fu, Raymond Lin, Daniel Inge. Fully homomorphic encryption is notoriously ineffieicnt, but
*partially homomorphic encryption*might sometimes be good enough. This project implements homomorphic compression of images. This can be useful for example to run a machine learning classifier on an image (such classifiers often work on compressed images). I was surpised this can actually be done, but they even have code (using Microsoft’s SEAL library) available on this repository.

- Introduction to Encrypted Voting Protocols / Eric Bornstein:
*Voting*underlies many of the most important decisions we as a society have to undertake. The importance of the security of (and public confidence in) voting procedures cannot be overstated, and yet many current implementations leave something to be desired. This comprehensive survey discusses the differnt goals we wish voting protocols to achieve, including accuracy, privacy, and freedom from coercion, and various cryptographic, as well as paper based (and combinations thereof) ways to achieve them. In particular it covers the ideas of*Mix Nets*(for anonymity),*Visual Cryptography*(for usablity) , and*Vote/Anti-vote pairs*(for receipt freeness). If you have any interest in voting, you’d probably want to take a look.

- Proof of sequential work / Meena Jagadeesan, Alec Sun, and Alex Wei:
*Proofs of work*have recently been of great interest to cryptographers, as they underlie the Bitcoin blockchain. However, the type of proofs used by Bitcoin and its ilk are inherently parallelizable, which gives some limits to their usability. For example, one cannot use them to measure elapsed time- only total cost.*Proofs of sequential work*adress this by providing a “puzzle” that can only be solved by $N$ sequential steps. The project surveys some of the older constructions as well as the very recent one of Cohen and Pietrzak, and discusses ideas for addressing some of the open problems in this area.

- Analysis of vulnerabilities and mitigations in Multiparty Computation / Leor Fishman: This project analyzed some subtle but important issues in multiparty secure computation. One of the interesting challenges in the area of multiparty computation is the idea of
*covert computation*: how can you hide not just*what*is being computed but even the mere fact that*something*is being computed? The project considers a strong (but well motivated!) notion of security where even some of the parties to the protocol would not know the identity of others, and shows some negative results. It then also studies issues of communication patterns in MPC, and in particular protocols that divide the communications into smaller groups or “quorums”. The project shows that the effect of such optimization on security is stronger than what one might have thought initially.

- A secure zero knowledge two factor authentication protocol / Albert Chalom, Nathan Wolfe: This project studies how one achieves the intersection of several goals in authentication: handling
*fuzzy*(and potentially low entropy) authentication data, such as fingerprint etc.., but while also making sure that (as is done with passwords) this data is not stored in the clear in the server. They show a “proof of concept” using two party secure computation.

- Shor’s Algorithm: Andrei Laurentiu Ciupan. Shor’s algorithm is of course a fundamental result that has energized much research on both quantum computing and cryptography over the last two decades. This project is a short (8 page) but almost complete (except for some results on continued fractions) description of the algorithm and its analysis.

- Query and Communication Lower Bounds for Key-Exchange Protocols: a Survey / Noah Golowich: This project touches a topic that’s dear to my heart – understanding the possiblity of implementing key exchange based only on an idealized random oracle, without making hardness of assumptions on
*structured*problems such as lattices, factoring, etc.. It surveys some of this highly technical literature, and mentions a very recent result on communication tradeoffs and some open problems.

- A survey on quantum secure cryptographic systems / Tomoka Kan: This projects the landscape of cryptographic schemes whose security was analyzed in the setting where not only
*computation*but also*communication*is quantum. This setting allows the adversary some extra powers, as they can make “super position queries” to the other parties, and the issue of security is quite subtle and interesting.

As I mentioned before, I am teaching CS 121 at Harvard, and have written my own text, with the (not very original) title “Introduction to Theoretical Computer Science” . I am hoping for this text to turn into a published textbook in the next year or two. Toward this end, I would be grateful for any comments or suggestions on the text, as I will be working on it over the summer.

You can use the issues page on the GitHub repository to make any suggestions, corrections, reports on bugs and typos, and so on.

I also hope that others instructors would consider using this text in their courses, and would welcome suggestions as to what I can do to help in that. In particular, over the summer I plan to add more exercises (including some solved ones) and more examples, as well as more explanations and “proof ideas” for many of the theorems.

The main differences between this text and “classical approaches” to intro theory courses such as Sipser’s are the following:

**Circuits / straightline programs as the basic model instead of automata**

I do not start with *finite automata* as the basic computational model, but rather with *straight-line programs* (or, equivalently *Boolean circuits*) in an extremely simple programming language which I call the “NAND programming language” since its only operation is assigning to one variable the NAND of two others.

Automata are discussed later in the course, after Turing machines and undecidability, as an example for a restricted computational model where problems such as halting are effectively solvable. This actually corresponds to the historical ordering: Boolean algebra goes back to Boole’s work in the 1850’s, Turing machines and undecidability were of course discovered in the 1930’s, while finite automata were introduced in the 1943 work of McCulloch and Pitts but only really understood in the seminal 1959 work of Rabin and Scott. More importantly, the main current practical motivations for restricted models such as regular and context free languages (whether it is for parsing, for analyzing liveness and safety, or even for routing on software defined networks) are precisely because these are tractable models in which semantic questions can be effectively answered. This motivation can be better appreciated *after* students see the undecidability of semantic properties of general computing models.

Moreover, the Boolean circuit / straightline programs model is extremely simple to both describe and analyze, and some of the main lessons of the theory of computation, including the notions of the *duality between code and data*, and the idea of *universality*, can already be seen in this context. Nonuniform computation also gives essential background for some exciting topics I like to cover later in the course. For example, cryptography (making sense of notions such as “256 bits of security” or difficulty of bitcoin mining), pseudorandom generators and the conjecture that BPP=P, and quantum computing. Circuits of course also yield the simplest proof of the Cook Levin theorem.

**A programming language instead of Turing machines for modeling uniform computation**

Instead of Turing Machines, I introduce uniform computation using an equivalent model obtained by extending the straightline NAND programming language mentioned above to include loops and arrays (I call the resulting programming language “NAND++”). I do define Turing machines and show the equivalence to NAND++ program, so the students can connect this both to the historical context as well to other literature they may encounter.

I believe that the programming language formalism makes this model more concrete and familiar to the students than Turing machines. For examples, a result such as the equivalence of Turing machines with two dimensional tapes and one dimensional tapes is now described as showing that a language with two dimensional arrays can be “transpiled” into a language with one dimensional arrays only (i.e., two dimensional arrays can be implemented via “syntactic sugar”).

Moreover, because the NAND++ programming language is extremely simple and fully specified, it is easy to show its formal equivalence with TM’s. In fact, since NAND++ is essentially obtained by adding a “feedback loop” to Boolean circuits, this makes the Cook Levin theorem easier to prove. I even implemented the Cook Levin reduction in a Python notebook, and so students can see how one can transform a NAND++ program into a graph and a number such that has a cut of size if and only if the program had an input that makes it output . Alas, these graphs tend to be quite large:

I also introduce yet another extension of NAND++ that allows indirect access to arrays, and hence is essentially equivalent to the standard RAM machine model used (implicitly) in algorithms courses. (I call this language NAND<<.) Using a RAM based model makes the distinction between notions such as and time more meaningful, and makes the time complexity classes correspond to the informal definitions of linear and quadratic time that students encountered in their algorithms lectures (or their whiteboard coding interviews..).

**More advanced topics**

Reducing the time dedicated to automata (and eliminating context free languages, though I do touch on them in the text in an optional section) allows to spend more time on topics such randomness and computation, the interactions between *proofs* and *programs* (including Gödel’s incompleteness, interactive proof systems, and even a bit on the -calculus and the Curry-Howard correspondence), cryptography, and quantum computing.

The book is still work in progress, but I think is already quite usable as a textbook. Indeed, much of the feedback I got on the course was that people were happy with the text (though perhaps they merely preferred it to the lectures due to my teaching abilities 🙂 ). One student even got me this coffee mug:

In any case, I hope to get more comments as I work on it over the summer. Regardless of whether or not it’s eventually published as a book, I intend to keep a version of it freely available online.

There is an old joke that an economist would not lift a $100 bill lying on the sidewalk, since in equilibrium it should not be there. But economist Bryan Caplan from George Mason University believes the world is leaving trillion dollars or so on the sidewalk. The culprit in his mind is education, which developed countries tend to invest about 5-6% of their GDP on:

Correspondingly, the number of years people spend in school has been growing:

Most people would think this is a good thing. First of all, an educated population is a good in itself. Second, this seems correlated with growth in general:

However, Caplan believes education in high school and above is mostly a waste of time and money. (He is also suspicious about K-8 education, which he calls a “daycare center for kids”.) So in his mind, the GDP growth of advanced countries has been **in spite** of their investment in education. Who knows how far the U.S. could have come if the average years in school would have stayed at 8 or 10.

Caplan’s theory is that people’s productivity is just a function of innate skill, that is not changed at all by education. They only go to high school, college, and beyond due to an “arms race” of credentials aided by ease of access to education. If you own a business and believe Caplan’s theory, you’d probably want to set up shop as far away from universities as possible, where you could get quality workers on the cheap that did not waste time on education. Perhaps Amazon should hire Caplan as a consultant, they seem to be doing the exact opposite.

Caplan also wrote a book. Scott Aaronson is not impressed.

## Some reading recommendations

Quanta magazine has an excellent article by Erica Klarreich on the recent progress on the 2 to 2 conjecture, which I have blogged about before. The article does not go into the technical details, but gives a good perspective on what’s been done and what are the challenges ahead.

Eric Posner and Glen Weyl have a new book on “Radical Markets” which I view as to some extent taking a “computer sciency approach” to some of the basic problems of society, economics and democracy. It’s “computer sciency” in the sense that their approach doesn’t place too high of a premium on “backwards compatibility” and also in the sense that, while they talk about transforming basic notions of property rights and even human rights in physical countries, their ideas might end up being more applicable to “digital countries” such as cryptocurrencies and online communities. Indeed, as Eli Ben-Sasson wrote here, many of the unanswered questions in cryptocurrencies are less about crypto and more about governance. (A third more technical sense is that it seems that some of their ideas, such as quadratic voting, correspond to replacing the 1 norm with the 2 norm, which is of course a theme we have seen before. )

Avi Wigderson posted a self contained version of the epilogue of his book on mathematics and computation. This epilogue gives a panoramic view of TCS’s past, present, and future, as well as its connections with its near and not so near neighbors. Regardless of your background, you will learn something from this book.

Finally, I’ve recently started to be interested in connections between physics and algorithms, and in particular the relations between statistical physics and algorithmic difficulty. I highly recommend these lecture notes by Afonso Bandiera, Amelia Perry, and Alex Wein. In a tough year for our community, Amelia was another brilliant student that we lost much much too soon, and these notes are dedicated in her memory.

## Childcare at STOC 2018 “TheoryFest”

*(Announcement from Ilias Diakonikolas and David Kempe)*

We are pleased to announce that we will provide pooled, subsidized

child care at STOC 2018. The cost will be $40 per day per child for

regular conference attendees, and $20 per day per child for students.

For more detailed information, including how to register for STOC 2018

childcare, see http://acm-stoc.org/stoc2018/childcare.html

Ilias Diakonikolas and David Kempe (local arrangements chairs)

## Lecture notes on DKKMS

Mitali Bafna, Chi-Ning Chou, and Zhao Song wrote scribe notes for my lectures on the Dinur et al proof of the 2 to 2 conjecture (see the DKKMS and KMS papers, though this presentation follows a different, and in my view simpler, approach.)

“Scribe notes” is really an understatement. In a heroic work, Mitali, Chi-Ning and Zhao supplied many more details and proofs (and much better exposition, including figures and comments) than I actually did in my talks. (One proof is still missing, but hopefully it will be updated in a few weeks or so.)

The notes assume no background in prior works on PCP and unique games, and can be very useful for anyone interested in these recent breakthroughs. As I mentioned before, the reduction itself can be described in about 5 lines. The soundness analysis is unfortunately more complicated…

## Consider signing a pledge for inclusiveness in TCS

Edith Cohen, Vitaly Feldman, Omer Reingold and Ronitt Rubinfeld wrote a pledge for inclusiveness in TCS. I think it is mostly common sense: saying that not just our universities, but also our conferences and workshops, are part of our workplace, and that we should strive for them to be free of harassment. But sometimes it is very useful to explicitly state common sense statements, as they set the right level of expectations.

So, if you agree with the pledge’s statement and are a member of the TCS community, regardless of whether you are a student, postdoc, faculty, in industry, or any other member, please do consider signing it here.