Rethinking the “Intro Theory” course

TL;DR: New notes on introduction to theoretical computer science are available at

This fall I will be teaching CS 121 at Harvard: Introduction to Theoretical Computer Science.
This type of “intro theory” course is taught at many universities, sometimes under the name “introduction to the theory of computation” or “computability and automata”, typically using Sipser’s excellent book.
Sipser’s book is incredibly well-written and liked by students, but there have been many developments in theoretical computer science and CS at large since it was written, both in terms of new results, techniques, and applications, as well as new emphases and points of view on this area.

So what should an “intro TCS” course contain these days?
My sense is that this course is first and foremost about studying computation in its own right, as opposed to being a tool to solve other problems.
This is not just about computability and complexity, but also about looking at different ways to measure the “goodness” of algorithms, especially given the drastic change in the way algorithms interact with the world these days.
So, for example, students in such a course should get at least some awareness of questions such as privacy or incentive-compatibility, even if doing these topics justice requires dedicated courses.

Another point that should be “hammered in” more is of hardness as a resource.
This of course arises in derandomization and cryptography, where recently cryptocurrencies have been quite explicit about equating computational difficulty with “mining” precious metals.
The interaction of computation with the physical world need to also be covered, and these days an “intro TCS” course cannot skip talking about quantum computing, which is our current best approach for modelling physically feasible computation.

But more than anything, considering computation as an object in its own right forces students to change their way of thinking, and truly come to terms with questions such as how do we model computation, how code is also data, and all the self-referential “paradoxes” and complications this entails.
This has always been present in “intro TOC” courses, but students can sometimes lose the big perspective while working on the n-th proof of non-regularity using the pumping lemma, or the m-th NP hardness reduction.

I am of course not alone in thinking of the need for a “more modern” intro TCS course.
See for example, Scott Aaronson’s MIT course. Also CMU’s renowned (or is it infamous? 🙂 ) Great Ideas in TCS has been following such an approach for quite a while, and the folks at CMU have been very kind with advice and suggestions.

There are also books such as Aaronson’s Quantum Computing Since Democritus and Moore and Mertens’ The nature of computation that take a similar perspective, though neither is quite appropriate as an introductory undergraduate textbook.

Thus, I am working on writing a new set of lecture notes, which are available on
These notes are very much a work in progress, and I welcome comments and suggestions.
In particular, the source for these notes is posted on a github repository and people are very much invited to use the issues and pull requests features there to post suggestions, comments, and point out (or even fix!) typos and bugs.

Some of the differences between these notes and the typical treatment include the following:

  • Very little automata: We will not start with automata. I find regular and context free languages to be important topics, but believe that they should be introduced as an example of a restricted model after one learns about general models and undecidability. For example, once you’ve seen the impossiblity of the halting problem, you will understand why operating systems typically allow you to search for files using regular expressions and not by specifying an arbitrary function, as you would not want the shell to enter into an infinite loop.
    Generally, automata play a very small role in this version of this course.
  • Start with straightline programs: Rather than talk about Turing machines, the first computational model is an extremely simple programming language that has only a single operation: foo := bar NAND baz assigns to the variable foo the result of the NAND of the variables bar and baz. There are no loops, if statements, or subroutines, so this corresponds to straightline programs that are of course the same as Boolean circuits. Since students are already familiar with programming, I believe that starting with such an ultra-simple programming language makes things more concrete. Moreover, thanks to the efforts of my wonderful teaching fellow Juan Esteller, there is an implementation of the NAND, NAND++, and NAND<< programming languages that are taught in this course.
  • De-emphasize Turing machines: Rather than Turing machines, our universal computational model is an enhancement of the NAND programming language to the NAND++ programming language that allows loops by simply stipulating that the program execution goes back to the first line if the loop variable has the value 1. We also introduce an index variable i so that we can use foo_i to access a potentially unbounded memory. Of course, this is equivalent to Turing machines, and we show this equivalence (as well as equivalence to the lambda calculus and other models) . We also give a further enhancement of the NAND++ language that is equivalent to RAM programs, and so allows us to measure running time in the same it is measured in algorithms courses.
  • Algorithms: While a theory of computation course is traditionally about the impossibility results or “lower bounds” part of TCS, it cannot afford to ignore algorithms completely. You can’t understand the challenge of trying to prove hardness before you’ve seen non-trivial algorithms. So, we will mention results such as the Karatsuba algorithm for multiplication, and how very often there is a fine line separating problems such as minimum cut and maximum cut, 2SAT and 3SAT, determinant and permanent, etc..
  • Move fast and break things: Because we skip automata and don’t delve into the intricacies of Turing machines (and because our NAND model is essentially designed to make the proof of the Cook-Levin Theorem easy), we will (hopefully) be done with NP completeness before the mid-point of this course. This will allow us to talk about some advanced topics, see below.
  • Randomness and cryptography: Randomness has become essential to computer science theory and practice, both as a computational resource and as a way to model the world. We talk about probabilistic algorithms, give some examples, and talk about derandomization (based on strong average-case assumptions), which is a nice seque into cryptography. In a cryptography lecture in an intro TCS course, people often show the RSA algorithm. We might do so too, but only as a way to motivate quantum computing later on. I want to convey the notion of how fantastic is the whole idea of public key encryption in the first place, and mention some of the even more fantastical notions such as fully homomorphic encryption. We will also see a public-key cryptosystem based on the hardness of solving noisy linear equations, which, unlike RSA, is probably going to be more similar to the commonly used public key systems 5-10 years from now. (Since I teach a whole course on cryptography, I might make the treatment of crypto in the TCS course briefer.)
  • Algorithms and society: The TCS angle can and has provided useful insights on how we design algorithms where the input is provided by humans and their output effects humans. Students should at least know that this field exists, and mention issues such as incentive-compatibility, differential privacy and issues of governance in cryptocurrencies (perhaps tying this into the consensus problem in distributed computing).
  • Entropy, coding, and compression: An understanding of entropy has become essential for both theoretical and applied computer science, whether it is to bound the complexity of a password, the number of samples required to learn a concept, or the length of a representation, entropy is all around us, and we will cover some basic information theory as well.
  • Quantum computing: As mentioned, a theory of computer science has to talk about our realistic physical models for computation, and some of the surprising outcomes of quantum computation. While a full description of Shor’s algorithm may require too much lecture time, I find that Simon’s algorithm conveys the heart of its power and is much simpler to describe.


It is definitely an ambitious plan that will not make for an easy course to take or to teach.
Some might say that it’s a “recipe for disaster” but the saving grace is that, with the help of Salil Vadhan, we have assembled a wonderful teaching staff that should be a great help for the students in keeping up.
I will report back on how realistic it was at the end of the term.
Even with our fast pace, there are many areas of theory that are not mentioned above, but students should be exposed to. I hope to cover some of these topics in this course, or at least write lecture notes about them so that curious advanced students could read about on their own.


Once again, much of this is not novel, and courses such as the ones mentioned above have taken similar approaches. However, I still hope these notes can serve a useful resource for other people who teach similar courses.


Acknowledgements: These lecture notes are constantly evolving, and I am getting input from several people, for which I am deeply grateful.
Thanks to Jarosław Błasiok, Chi-Ning Chou, Juan Esteller, Alex Lombardi, Thomas Orton, Juan Perdomo, and Ryan Williams for helpful feedback.
Thanks to Anil Ada, Venkat Guruswami, and Ryan O’Donnell for helpful tips from their experience in teaching CMU 15-251.
Thanks to Juan Esteller for his work on implementing the NAND* languages.
Thanks to David Steurer for writing the scripts that we used for notes on sum of squares and that I am now repurposing to produce these notes and for his endless patience with me as I attempt to learn concepts such as git and makefiles. David’s code itself is based on several other packages, including pandoc, LateX, and the Tufte LaTeX and CSS packages.

14 thoughts on “Rethinking the “Intro Theory” course

  1. Finite state machines, regular expressions, and CFGs are too important in a practical sense to wait for the intro to theory of computation class. We teach the basics of the first two and introduce CFGs in our Foundations course which is one of the first classes after CS1 and CS2. FSMs (not just using the narrow view of them as language recognizers) are useful as models for design especially in hardware. The material is mostly a number of constructions and algorithms. Minimization is an especially broadly useful algorithm. We do cover limits of FSMs and basic undecidability of the halting problem there also – the later using idealized high level programming languages. See, e.g.,

    This gives a lot more freedom for what to do in the (optional) Intro to Theory class.

  2. Thanks Sushant and Jelani – I updated the link though hopefully will also fix the original one soon.

    Paul: Thanks for the link to UW’s foundations course. Harvard’s CS 121 (like CMU’s 15-251) is a required course, and so in that sense it’s similar to this class, except that we already assume discrete math background, which students get by either the (optional) discrete math course or through other means.

  3. Great post! A few notes though: you forgot to include “latex” between your dollar signs. And you wrote “as” twice in your paragraph on deemphasising Turing machines.

  4. Sounds great!

    I was a TA for the “old” CS121 and I do remember thinking “for most students this is their one exposure to theory; is automata theory really the aspect of theory that we want to emphasize?”

    Using NAND++ as a definition instead of Turing machines is very interesting. I think the biggest downside of the TM formalism is that they’re hard to program. (We asked the students to create a TM for a certain task in one problem set. It was a disaster.) NAND++ might fix that shortcoming.

    NAND++ might also be more “efficient” in the sense that time complexities are closer to the RAM model. TMs are not a great formalism if you care about n^2 vs n^3 time.

    1. Thank you! NAND++ itself is an oblivious model and hence not very efficient, but we also introduce a related model NAND<< that is closely related to the RAM model

  5. I downloaded the lecture notes. I like the approach, the organization of the content and the presentation style.
    I have read the “Mathematical Background” section, which reads very well. I found a couple typos:
    p.33 is only defines —> defined
    p.36 there are example where —> examples

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s