Skip to content

Outreach on Fairness, Privacy and Data Analysis

August 12, 2015
  1. A lovely interview with Cynthia Dwork in the New York Times on bias in computations. In particular, discussing our work (joint with Moritz Hardt, Toni Pitassi and Rich Zemel) on Fairness through Awareness.
  1. Our Science article, The reusable holdout: Preserving validity in adaptive data analysis (joint with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi and Aaron Roth), is out.


update (8/16/2015): On Privacy and Anonymisation (with Cynthia Dwork and Salil Vadhan) in The Economist

Sanjeev Arora on rethinking the graduate algorithms course

August 2, 2015

[Below is a guest post from Sanjeev Arora on his redesign of the traditional graduate algorithms course to be a better match for today’s students. –Boaz]

For the last two years I have tried new ideas in teaching algorithms at the graduate level. The course is directed at first year CS grads, but is also taken by grads from related disciplines, and many advanced undergrads. (Links to course homepage, and single file with all course materials.)

The course may be interesting to you if, like me, you are rethinking the traditional choice of topics. The following were my thoughts behind the redesign:

  • The environment for algorithms design and use has greatly changed since the 1980s. Problems tend to be less cleanly stated (as opposed to “bipartite matching” or “maximum flow”) and often involve high-dimensional and/or noisy inputs. Continuous optimization is increasingly important.
  • As the last theory course my students (grad or undergrad) might take for the rest of their lives, it should somewhat fill in holes in their undergraduate CS education: information/coding theory, economic utility and game theory, decision-making under uncertainty, cryptography (anything beyond the RSA cryptosystem), etc.
  • Programming assignments need to be brought back! CS students like hands-on learning: an algorithm becomes real only once they see it run on real data. Also, computer scientists today —whether in industry or academia—rely on subroutine libraries and scripting languages. A few lines in Matlab and Scipy can be written in minutes and run on datasets of millions or billions of numbers. No JAVA or C++ needed! Algorithms education should weave in such powerful tools. It is beneficial even for theory students play with them.

Sample programming assignments: (a) (compression via SVD) given a 512 x 512 grayscale image, treat it as a matrix and take its rank k approximation via SVD, for k=15, 30,45,60.  Use mat2gray in matlab to render this new matrix as a grayscale image and see what k suffices for realistic recovery. (b) You are given S&P stock price data for 10 years. run online gradient descent to manage a portfolio (Lecture 16), and report what returns you get with various parameter settings.

Students are allowed to do a final project in lieu of a final, and many choose to apply algorithms to some real world problem they are interested in. Sample projects are also listed on the course page.

I welcome your comments, suggestions, and links to other relevant course materials on the web!

Joining the new Harvard

July 16, 2015
After five fun and stimulating years in the wonderful Microsoft Research New England, I have decided to move on. I will be joining Harvard University as a professor of Computer Science in spring 2016. Moreover, I am thrilled to say that Madhu Sudan will also be joining Harvard. Harvard’s Computer Science is on a growth streak and so stay tuned for more news.
Indeed, throughout my discussions with Harvard I have been consistently impressed by the commitment at all levels, from the President down, to apply all the considerable material and intellectual resources of Harvard to make it one of the absolute top places for Theoretical Computer Science and Computer Science at large. This is not just great for Harvard, but also for our field, and I am truly excited to play a role in this endeavor.

Indistinguishability Obfuscation and Multi-linear Maps: A Brave New World – Guest Post by Ran Canetti

July 6, 2015

A bunch of us hapless cryptographers got the following boilerplate comment from the FOCS’15 PC:

“Overall, submissions related to multi-linear maps and indistinguishability obfuscation were held to a somewhat higher standard. The PC expressed some concern with the recent flurry of activities pertaining to multi-linear maps and indistinguishability obfuscation, given how little we understand and can say and *prove* about the underlying hardness assumptions”.

This comment was clearly written with the best of intentions, to explain views expressed at the PC deliberations.  And I’m thankful to it – mainly since it made the underlying misconceptions so explicit that it mandated a response.  So, after discussing and commiserating with colleagues here at Simons, and after amusing ourselves with some analogues of above statement (e.g., “results on NP completeness are held to  a higher standard  given how little we understand and can say and  *prove* about the hardness solving SAT in polynomial time”),  I decided to try to write an – obviously subjective – account for the recent developments in multilinear maps and  indistinguishability obfuscation (IO)  and why this exciting research should be embraced and highlighted rather than “held to a somewhat higher standard”  —  in spite of how little we understand about the underlying assumptions. The account is aimed at the general CS-theorist.

Let me start by giving rough definitions of the concepts involved.  An Indistinguishability Obfuscator (IO) is a randomized algorithm O that takes as input a circuit  C and outputs a (distribution over) circuits O(C) with the properties that:

  1. C and O(C) have the same functionality,
  2. O(C) is only polynomially larger than C
  3. for any two same-size, functuinally equivalent circuits C and C’ we have that O(C) ~ O(C’)   (i.e., the distributions over strings representing O(C) and O(C’) are computationally indistinguishable).

IO has been proposed as a notion of obfuscation in 2000 (Hada, Barak-Goldreich-Impagliazzo-Sahai-Vadhan-Yang). Indeed, it is arguably a clean and appealing notion – in some sense the natural extension of semantic security of standard encryption to “functionality-preserving encryption of programs”. However, it has been largely viewed as too weak to be of real applicability or interest.  (There were also no candidate polytime IO schemes, but this in my eyes is a secondary point, see below.)

Things changed dramatically in 2013 when Sahai and Waters demonstrated how IO schemes can be ingeniously combined with other rather “mundane” cryptographic constructs to do some amazing things. Since then dozens of papers came about that extend the SW techniques and apply them to obtain even more amazing things – that by now have transcended crypto and spilled over to other areas. (e.g.: deniable encryption, succinct delegation, succinct multi-party computation with hardly any interaction, one message succinct witness hiding and witness indistinguishable proofs, hash functions with random-oracle-like properties, hardness results for PPAD, and many more). In fact, think about a result in your area that assumes that some computation is done inside a black box – most probably IO can replace that assumption in one way or another…

Still, my (subjective but distinct) feeling is that we are far from understanding the limits and full power of IO.  Furthermore, the study of IO has brought with it a whole new toolbox of techniques that are intriguing in their own right, and teach us about the power and limitations of working with “encrypted computations”.

So far I have not mentioned any candidate constructions of IO – and indeed the above study is arguably valuable as a pure study of this amazing concept, even without any candidate constructions.  (Paraphrasing Levin on quantum computers, one can take the viewpoint that the above is the study of impossibility results for IO…)

However, unlike quantum computers, here we also have candidate constructions. This is where multilinear maps come to play.

Multi-linear maps are this cool new technical tool (or set of tools) that was recently put forth. (The general concept was proposed by Boneh and Silverberg around 2000, and the first candidate construction of one of the current variants was presented in 2012 by Garg, Gentry and Halevi.)   Essentially, a multilinear map scheme is a fully homomorphic encryption scheme where the public key provides, in addition to the ability to encrypt elements and perform homomorphic operations on ciphertexts, also the ability to partially decrypt ciphertexts under certain restrictions. There are many incomparable variants of this general paradigm, which differ both in the functionality provided and in the security guarantees. Indeed, variants appear to be closely tied to candidate constructions. Furthermore, our understanding of what’s possible here has been evolving considerably, with multiple new constructions, attacks, and fixes reported.

Still, the number and variety of applications of multi-linear maps makes it clear that this “family of primitives” is extremely powerful and well worth studying – both at the level of candidate constructions, at the level of finding the “right” computational abstractions, and at the level of applications.  In a sense, we are here back to the 70’s: we are faced with this new set of algebraic and number theoretic tools, and are struggling to find good ways to use them and abstract them.

Indeed, some of the most powerful applications of multilinear maps are candidate constructions of IO schemes. The first such candidate construction (by Garg, Gentry, Halevi, Raykova, Sahai and Waters in 2013) came with only heuristic arguments for security; However more rigorous analyses of this and other constructions, based on well-defined formulations of multi-linear map variants, soon followed suite.  Some of these analyses have eventually been “broken” in the sense that we currently don’t have candidate constructions that satisfy the properties they assume. Still, other analyses do remain valid. Indeed, there are no attacks against the actual basic IO scheme of Garg et al.

The fact that the only current candidate constructions of IO need to assume existence of some variant of multi-linear maps at some point or another may make it seem as it the two concepts are somehow tied together. However, there is no reason to believe that this is the case. For all we know, multi-linear maps are just the path first uncovered to IO, and other paths may well be found. Similarly, even if IO turns out to be unobtainable for some reason, the study of multilinear maps and their power will still remain very relevant.

So, to sum up this long-winded account:

  • IO is a natural and fascinating computational concept. Studying its consequences (both within and outside cryptography) is a well worth endeavor.
  • Studying new candidate constructions of IO and/or new analyses of their security is another well worth endeavor.
  • Multilinear maps are an intriguing and powerful set of techniques and tools. Finding better candidate constructions and abstractions is of central importance to cryptography. Finding new cool uses of these maps is another intriguing challenge.
  • The three should be treated as separate (although touching and potentially interleaving) research efforts.


I’d like to thank Guy Rothblum and Vinod Vaikuntanathan for great comments that significantly improved this post.

Popularizing TOC

July 1, 2015

It is hard to overestimate the impact of Popular Science books such as “A Brief History of Time” and “Chaos: Making a New Science” on Scientific Research. The indirect impact of popularizing Science and Scientific Education often surpass the direct contribution that most scientists can hope to achieve in their life time. For this reason, many of the greatest scientists (including in our field) choose to invest considerable time in this blessed endeavor. I personally believe that the Theory of Computing deserves more popularization than it gets (and I hope to someday contribute my share). Nevertheless, this post is meant as a tribute to our colleagues who already made wonderful such contributions. I will continuously edit this post with TOC popular books and educational resources (based on my own knowledge and suggestions in the comments).

Popular TOC books:

Scott Aaronson, Quantum Computing since Democritus

Martin Davis, Engines of Logic: Mathematicians and the Origin of the Computer

A. K. Dewdney, The New Turing Omnibus: Sixty-Six Excursions in Computer Science

David Harel, Computers Ltd.: What They Really Can’t Do

David Harel with Yishai Feldman, Algorithmics: The Spirit of Computing

Douglas Hofstadter: Gödel, Escher, Bach: An Eternal Golden Braid

Lance Fortnow, The Golden Ticket: P, NP, and the Search for the Impossible

Cristopher Moore and Stephan Mertens, The Nature of Computation

Dennis Shasha and Cathy Lazere, Out of their Minds: The Lives and Discoveries of 15 Great Computer Scientists

Leslie Valiant, Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World

Leslie Valiant, Circuits of the Mind

Noson S. Yanofsky, The Outer Limits of Reason: What Science, Mathematics, and Logic Cannot Tell Us

Hector Zenil, Randomness Through Computation: Some Answers, More Questions


Apostolos Doxiadis and Christos Papadimitriou, Logicomix: An epic search for truth

Christos H. Papadimitriou, Turing (A Novel about Computation)

Other Resources:

CS Unplugged (including a book)


Doing a 180 and still spinning

June 24, 2015

I taught my first class last quarter and it was an enjoyable and eye-opening experience at many levels. First some background. The class was undergraduate algorithms or as popularly known in UCLA – CS180. There were 129 students (kind of like jumping into the deep end to test the waters). Like most other CS curricula, it is a core required course and as I later heard from the students, the class can have a significant impact on where you intern or even get employed eventually (all software companies want to know how you did in this course).

This post is meant to record some of my observations.

How I felt: The first two weeks felt a bit stressful and burdensome. But once I got used to it, I started enjoying the lectures and it was indeed quite pleasing to hear (and in some cases see) that a good fraction of the students liked the material and see them participating in class.

Hindsight: The most significant point was the level of the assignments. Here I erred mainly due to a mismatch in expectations. The first assignment, the median was 100% so I increased the level. The next one was at 77% which still felt high and not challenging enough for the students. At this point I consciously had 50% of each assignment be moderately easy problems (directly based on class work) and the remaining 50% range from not-so-easy to problems requiring at least one new idea. While perhaps the concept was right, the proportions were off from what the students expected. A 80-20 or so split would have been much better in hindsight. I got it almost right for the final with the median being 75%.

There were no real surprises in the syllabus covered with most topics being in common with other similar classes (you can compare here: Harvard, MIT 1, MIT2, MIT 3, CMU 1, CMU 2, Stanford 1, Stanford 2, Coursera-Stanford). However, it did feel a little ambitious in the end and the content needs some pruning. For instance, I spent one lecture each on three somewhat non-standard topics – analyzing sampling methods, contention resolution and cuckoo hashing. For the next time perhaps covering one of them or even none would be better.

A few people asked to include a programming component in the course. This makes perfect sense and I indeed considered it seriously at the beginning and thought about doing something like what Jelani Nelson used at Harvard. But it was plainly infeasible to have programming components in the assignments with the available resources (Jelani tells me he had 10 TAs for a class of about 180). Perhaps for the next time around I can suggest problems for students to play with even if they won’t be graded.

One other request was for practice midterm/final questions. I am still undecided about this one.

Proofs: I spent a lot of time in class proving that various (in some cases extremely simple) algorithms work. This is not an exception for this course, but seems to be true for most similar courses (check the syllabi: Harvard, MIT 1, MIT2, MIT 3, CMU 1, CMU 2, Stanford 1, Stanford 2, Coursera-Stanford).

So, as a few students asked, why so much emphasis on proofs in an algorithms class? There are two separate issues here. First, perhaps my not-so-clear presentation (this is the first run after all). Let us separate that from the second, probably more pressing one – if the goal of an algorithms course is to develop algorithmic thinking and/or prepare the students mainly for a career in software engineering, why should we (by we I mean all algorithms courses across the universities) emphasize proofs?

First, which proofs did I spend a lot of time doing? Well, there was 1) BFS/DFS, 2) FFT, 3) Minimum spanning trees, 4) Sampling, 5) Quicksort, 6) Hashing.

BFS/DFS we can explain as they serve as examples to illustrate induction, invariants etc. For FFT, the algorithm and the proof are one and the same – you can’t quite come up with the algorithm without the proof. But how about the others?

Take MST, Quicksort, hashing. With the right questions, you can motivate students to come up with the algorithms themselves as they are indeed quite natural and simple. But shouldn’t that be the end of developing algorithmic thinking? Same goes for Quicksort, hashing. Randomized divide & conquer makes intuitive sense and so does making random choices when in doubt. Why go deeply into probability, linearity-of-expectation to analyze these? Here are two worthwhile reasons (among many) I can think of.

First, speed is not everything – we need to be sure that the algorithm works. At the end of the day, even when you just want to build something hands on, in many cases you need to be absolutely sure that what you have actually works. For example, it is easy to come up with examples where greedy fails. In class I did do an example (knapsack) where greedy strategy fails. However, looking back I should have emphasized it more and drawn a parallel with other examples where greedy fails.

Second, the goal of the course is not just to help with programming faster code but also to serve as a launching pad for a path to computer science (with emphasis on the ‘science’). Even in the former case, thinking about algorithms in a principled way and being able to analyze them will eventually help in designing new algorithms; especially so when you have to tweak existing algorithms for new applications. Looking back, including more examples to demonstrate this concept would have been quite helpful.
Future: I look forward to teaching the class again keeping the above points in mind. It should only get better for me and hopefully for the students too.

Historical Papers in Cryptography: Umesh Vazirani on Quantum and Post-Quantum Cryptography

June 22, 2015

The cryptography semester at the Simons Institute is well on its way. Last week we had a fascinating workshop on securing computation: thanks to Hugo Krawczyk and Amit Sahai for organizing. You can find the program and video links here (covering, among many other topics, everything you always wanted to know about obfuscation but were afraid to ask). Beyond the tremendous energy and excitement about cryptography research, participants have also been keeping busy with regular movie nights, swing dancing lessons, playback theater, volleyball and hiking adventures.

This week, the lecture series on historical papers in cryptography continues, now complete with its own webpage and video links. From Vinod: “we will hear about the love affair between quantum computing and cryptography through the words of the inimitable Umesh Vazirani. Everyone’s invited”.

If you’re in the greater Berkeley area, please do drop by. Details below.

Quantum and Post-Quantum Cryptography
Speaker: Umesh Vazirani (UC Berkeley)
Date: Monday June 22, 2-3:30pm
Location: Calvin Lab Auditorium

This talk will trace the fundamental impact of quantum computation on cryptography, including the breaking of classical cryptostems such as RSA by quantum algorithms and, remarkably, the use of quantum algorithms to design and establish security of other classical cryptosystems. I will also describe how novel features of quantum states have been exploited to create quantum cryptographic primitives, and the challenges in defining and establishing security of such primitives. The talk is aimed at a general audience and will not assume any background in quantum computation.


Get every new post delivered to your Inbox.

Join 317 other followers