Different areas of mathematics, and theoretical computer science, freely borrow tools and ideas from each other and these interactions, like barters, can make both parties richer. In fact it’s even better than barter: the objects being ideas, you don’t have to actually give them up in an exchange.

And so it is no surprise that differential privacy has used tools from several other fields, including complexity, cryptography, learning theory and high-dimensional geometry. Today I want to talk about a little giving back. A small interest payment, if you will. Below I will describe how differential privacy tools helped us resolve a question of Alon and Kalai.

Discrepancy

This particular instance deals with discrepancy theory, more specifically, with the Erdös Discrepancy problem (EDP)(polymath 5 being discussed here).

In 1932, Erdös conjectured:

Conjecture[Problem 9 here] For any constant ${C}$, there is an ${n}$ such that the following holds. For any function ${f:[n] \rightarrow \{-1, +1\}}$, there exists an ${a}$  and a $k \leq n/a$ such that

$\displaystyle |\sum_{i = 1}^{k}{f(ia)}| > C.$

For any ${a,k}$, the sets ${S_{a,k} = \{ia: 0\leq i \leq k\}}$ is a arithmetic progression containing ${0}$; we call such a set a Homogenous Arithmetic Progression (HAP). The conjecture above says that for any red blue coloring of the [n], there is some HAP which has a lot more of red than blue (or vice versa).

In modern language, this is a question about discrepancy of HAPs. So let me define discrepancy first. Let ${U}$ be a universe and let ${\mathcal{S}}$ denote a family of subsets of ${U}$. A coloring ${f}$ of ${U}$ is an assignment ${f: U \rightarrow \{+1,-1\}}$. The discrepancy of a set ${S_i}$ under the coloring is simply ${| \sum_{j \in S_i} f(j) |}$; the imbalance of the set. The discrepancy of a set system under a coloring is the maximum discrepancy of any of the sets in the family. The minimum of this quantity over all colorings is then defined to be the discrepancy of the set system. We want a coloring in which all sets in ${\mathcal{S}}$ are as close to balanced as possible. In other words, if ${A}$ denotes the set-element incidence matrix of the set system, then ${\mathsf{disc}(A) = \min_{x \in \{-1,1\}^n} |Ax|_{\infty}}$.

Thus the conjecture says that the discrepancy of HAPs over ${[n]}$ grows with ${n}$.

This post deals with the related concept of Hereditary Discrepancy. The discrepancy can often be small by accident: even though the set system is complex enough to contain a high discrepancy set system in it, it can have discrepancy zero. The hereditary discrepancy measures the maximum discrepancy of any set system “contained” in ${(U,\mathcal{S})}$.

Formally, given a set system ${(U,\mathcal{S})}$, and a subset ${V \subseteq U}$, the restriction of ${(U,\mathcal{S})}$ to ${V}$ is another set system ${(V,\mathcal{S}_{|V})}$, where ${\mathcal{S}_{|V} = \{ S_i \cap V : S_i \in \mathcal{S}\}}$. If we think of the set system as a hypergraph on ${U}$, then the restriction is the induced hypergraph on ${V}$. The hereditary discrepancy ${\mathsf{herdisc}(U,\mathcal{S})}$ is the maximum discrepancy of any of its restrictions. In matrix language, ${\mathsf{herdisc}(A)}$ is simply ${\max_{A'} \mathsf{disc}(A')}$ where the maximum is taken over all submatrices of ${A}$.

Some examples. Let ${n}$ denote ${|U|}$. A totally unimodular matrix ${A}$ gives us a set system with hereditary discrepancy at most ${1}$. An arbitrary collection of ${n}$ sets has discrepancy at most ${O(\sqrt{n})}$. Note that a random coloring will give discrepancy about ${O(\sqrt{n \log n})}$. In a famous paper, Spencershowed the ${O(\sqrt{n})}$ upper bound, and Bansal and recently Lovett and Meka gave constructive versions of this result.

Privacy

Given a vector ${y \in \Re^n}$, and ${0}$${1}$ matrix ${A}$, consider the problem of outputting a vector ${z}$ such that ${|z - Ay|_{\infty}}$ is small, and yet the distribution of ${z}$ isdifferentially private, i.e. for ${y}$ and ${y'}$ that are close the distributions of the corresponding ${z}$‘s are close. If you are a regular reader of this blog, you must be no stranger to differential privacy. For the purposes of this post, a mechanism ${M : \Re^n \rightarrow \Re^m}$ satisfies differential privacy if for any ${y,y' \in \Re^n}$ such that ${|x-x'|_1 \leq 1}$, and any (measurable) ${S \subseteq \Re^m}$:

$\displaystyle \Pr[M(y) \in S] \leq 2 \Pr[M(y') \in S] .$

Thus if ${y,y'}$ are close in ${\ell_1}$, the distributions ${M(y),M(y')}$ are close in ${L_{\infty}}$.

Researchers have studied the question of designing good mechanisms for specific matrices ${A}$. Here by good, we mean that the expected value of the error say ${|z-Ay|_\infty}$, or ${|z-Ay|_2}$ is as small as possible. It is natural to also prove lower bounds on the error needed to answer specific queries of interest.

A particular set of queries of interest is the following. The coordinates of the vector ${y}$ are associated with the hypercube ${\{0,1\}^d}$. Think of ${d}$ binary attributes people may have, and for ${v \in \{0,1\}^d}$, ${y_{v}}$ denotes the number of people with ${v}$ as their attribute vector. For each subcube, defined by fixing ${k}$ of the ${d}$ bits, we look at the counting query corresponding to that subcube: i.e. ${\sum_{v \in \mbox{ subcube}} y_v}$. This corresponds to dot product with a vector ${a}$ with ${a_v}$ being one on the subcube, and zero elsewhere. Thus we may want to ask how many people have a ${0}$ in the first and second attribute, and a ${1}$ in the fourth. Consider the matrix ${A}$ defined by all the possible subcube queries.

Subcubes defined by fixing ${k}$ bits are ${k}$-juntas to some people, contingency table queries to others. These queries being important from a statistical point of view, Kasiviswanathan, Rudelson, Smith and Ullman showed lower bounds on the amount of error any differentrially private mechanism must add, for any constant ${k}$. When ${k}$ is ${\Omega(d)}$, their work suggests that one should get a lower bound that is ${2^{\Omega(d)}}$.

The connection

So what has discrepancy got to do with privacy? Muthukrishnan and Nikolov showed that if ${A}$ has large hereditary discrepancy, then any differentially private mechanism must incur large expected squared error. In fact, one can go back and check that nearly all known lower bounds for differentially private mechanism are really hereditary discrepancy lower bounds in disguise. Thus there is a deep connection between ${\mathsf{herdisc}(A)}$ and the minimum achievable error for ${A}$.

For the EDP, it is natural to ask how large the hereditary discrepancy is. Alon and Kalai show that it is ${\tilde{\Omega}(\sqrt{\log n})}$ and at most ${n^{O(\frac{1}{\log\log n})}}$. They also showed that for constant ${\epsilon}$, it is possible to delete an ${\epsilon}$ fraction of the integers in ${[n]}$, so that the remaining set system has hereditary discrepancy at most polylogarithmic in ${n}$. Gil guessed that the truth is closer to the lower bound.

Alex Nikolov and I managed to show that this is not the case. Since hereditary discrepancy is, well, hereditary, a lower bound on the hereditary discrepancy of a submatrix is also a lower bound on the hereditary discrepancy of whole matrix. In the EDP matrix on ${[n]}$, we will first find our subcubes-juntas-contingency-tables matrix ${A}$ above as a submatrix; one for which ${d}$ is ${\Theta(\frac{\log n}{\log \log n})}$. Having done that, it would remain to prove a lower bound for ${\mathsf{herdisc}(A)}$ itself.

The first step is done as follows: associate each dimension ${i \leq d}$ with ${2i}$th and the ${(2i+1)}$th prime. A point ${v = (v_1,v_2,\ldots,v_d)}$ in the hypercube is naturally associated with the integer ${f(v)=\Pi_{i} p_{2i+v_i}}$. A subcube query can be specified by a vector ${q \in \{0,1,\star\}^d}$: ${q}$ is set to the appropriate ${0}$${1}$ value for the coordinates that we fix, and ${\star}$ for the unconstrained coordinates. We can associate a subcube ${q}$ with the integer ${f(q)=\Pi_{i: q_i \neq \star} p_{2i+q_i}}$. It is easy to see that ${x}$ is in the subcube corresponding to ${q}$ if and only if ${f(q)}$ divides ${f(x)}$. Thus if we restrict ourselves to the integers ${\{f(v): v \in \{0,1\}^d\}}$ and HAPs corresponding to ${\{f(q): q \in \{0,1,\star\}^d\}}$, we have found a submatrix of the EDP matrix that looks exactly like our contingency tables matrix ${A}$. Thus the hereditary discrepancy of the EDP matrix is at least as large as that of this submatrix that we found.

Lower bounds for private mechanisms for ${A}$ can be derived in many ways. For constant ${k}$, the results of Kasiviswanathan et al. referred to above suffice and it is likely that they can be pushed to get the ${2^{\Omega(d)}}$ lower bounds we are shooting for. A very different approach of extracting from weak random sources also implies such a lower bound. It is likely that from these, one could get a lower bound on ${\mathsf{herdisc}}$ of the kind we need.

However, given what we know about ${A}$, we can in fact remove the privacy scaffolding and get a simpler direct proof of the lower bound of ${2^{\Omega(d)}}$ on ${\mathsf{herdisc}(A)}$, and can write a proof of the lower bound without any mention of privacy. This implies that the hereditary discrepancy for the EDP is at least ${2^{\Omega(d)} = n^{\Omega(\frac{1}{\log\log n})}}$, which matches the upper bound up to a constant in the exponent. A brief note with the proof is here.

Of course the EDP itself is wide open. Head on here to help settle the conjecture.

Many thanks to Alex Nikolov for his contribution to the content and the writing of this post.

1. September 6, 2012 12:57 am

Great post. Reminds me of a talk Avi once gave titled something like “why it’s worthwhile sitting in talks in different fields.” I should add that sometimes fields don’t only borrow ideas but also borrow the researchers from other fields. So beware guys … 🙂

2. September 11, 2012 12:19 pm

Where did you define EDP?

September 11, 2012 3:31 pm

My bad. EDP is the Erd\”os discrepancy problem that we started the post with. Now fixed.

September 11, 2012 7:04 pm

Great! Just saw the post.

4. September 13, 2012 5:18 pm

Very nice post! I have always thought about constructing DP lower bounds as equivalent to showing the existence of large packings around the optimal value. For example, our sample complexity bounds for learning in [1] and statistics [2] are all based on this concept of packings. Do you know if there’s a formal relationship between the two concepts?

[1] Sample Complexity Bounds for Differentially Private Learning
Kamalika Chaudhuri and Daniel Hsu, COLT 2011
[2] Convergence Rates for Differentially Private Statistical Estimation
Kamalika Chaudhuri and Daniel Hsu, ICML 2012

September 14, 2012 5:56 pm

Thanks for the pointers.
For (eps,0)-differential privacy, pretty much all lower bounds I know of are based on packing arguments. For linear queries, Moritz and I showed[1] that this is tight up to a polylog factor. We can go a little further than packing based lower bounds[2] by”adding up” packing-based lower bounds over orthogonal subspaces. It is a good question if packing based LBs are nearly tight in some more general setting.
For (eps,delta)-differential privacy, the picture is quite different. Packing based lower bounds only apply when delta is really small, and do not apply for say polynomially small delta. In fact, you can sometimes get (eps,1/poly(n))-DP mechanisms with accuracy significantly better than the packing based lower bounds for (eps,0) [3].
So in most settings, for proving non-trivial lower bounds of (eps,delta), one must go beyond packings. Many of the lower bounds go via lower bounds for some variant of blatant non-privacy. As this post mentions, these are often lower bounds on discrepancy, or on the smallest eigenvalue of an associated matrix.

[1] Hardt and Talwar, The Geometry of Differential Privacy. http://arxiv.org/abs/0907.3754
[2] Bhaskara et al. Unconditional Differentially Private Mechanisms for linear queries. http://www.cs.princeton.edu/~bhaskara/files/privacy.pdf
[3] Anindya De, Lower bounds on Differential Privacy, http://arxiv.org/abs/1107.2183

September 19, 2012 12:59 pm

Another paper from the same period that uses the packing technique:

A. Beimel, S. P. Kasiviswanathan, and K. Nissim. Bounds on the Sample Complexity for Private Learning and Private Data Release. In the Seventh Theory of Cryptography Conference, 2010. http://www.cs.bgu.ac.il/~beimel/Papers/BKN.pdf

September 19, 2012 10:23 pm

The references [2] and [3] were flipped. Just fixed that.
Also, just to clarify, I did not mean to suggest that [1] introduced packing arguments. There are earlier papers with a packing argument, and certainly people in the community knew of it independently well before it appeared in print. What I find surprising is that in many settings, it is tight or nearly tight for (eps,0). Our failure to come up with more clever lower bounding techniques is, in many cases, not our fault. 🙂

October 7, 2012 6:46 am

The definition of EDP given above is not quite correct. The problem is to bound all partial sums of the function along HAPs, not just the sums that go all the way up to $n$.

(Indeed, the conjecture as stated is trivially false: one can take $C=1$ and define $f$ by $f(m) = +1$ if $m \leq n/2$ and $f(m) = -1$ if $m > n/2$.)

(The definition given in the paper is correct …)

October 7, 2012 4:19 pm

Thanks for the correction. Now fixed.

October 16, 2014 6:13 am

Great post.

Are the following known? Any references or pointers would be welcome.

1. Fix $n$. Let $A_k$ be the $C_k^n \times 2^n$ matrix encoding subcube queries with $k$ of $n$ input variables fixed. What is the (plain, nonhereditary) discrepancy of $A$? In the 2 norm? In the infinity norm?

2. What about including all $k$ from 1 to $n$? Hence what is the discrepancy of all subcube queries?

• October 17, 2014 7:40 am

For subcubes of co-dimension $k$, the discrepancy is 0 if $k < n-1$ and 1 otherwise (in the infinity norm, $lates \sqrt{n}$ in $\ell_2$). For all subcubes, it is 1 in infinity norm and $\sqrt{n}$ in $\ell_2$.

All of these are achieved by the parity coloring: the color of $v \in \{0, 1\}^n$ is $(-1)^{\sum_i v_i}$. It is easy to see this has discrepancy 0 for any subcube of dimension greater than 0 (i.e. any subcube which is not a vertex).