In a previous blog post, we saw how ideas from differential privacy can be used to prove a lower bound on the hereditary discrepancy for the Erdös Discrepancy problem. This lower bound happened to be nearly tight.

It turns out that this tightness is no accident. The connection between hereditary discrepancy and privacy is in fact much deeper, and can be used to get an approximation algorithm for hereditary discrepancy of any set system. In this post, I will try to summarize this connection.

First, let us recall the definition of hereditary discrepancy. For a matrix ${A \in \{0,1\}^{m\times n}}$, the discrepancy of ${A}$ is defined as ${\mathsf{Disc}(A) = \min_{x \in \{-1,1\}^n} |Ax|_{\infty}}$. Thus we want to multiply each column of ${A}$ by either ${+1}$ or ${-1}$ so that each row sum is at most ${D}$ in absolute value. The smallest attainable ${D}$ is the discrepancy of ${A}$.

It may be worth mentioning a ${0}$-${1}$ matrix ${A}$ corresponds to a set system with elements corresponding to columns and sets corresponding to rows. The vector ${x}$ then corresponds to a coloring of the elements and the discrepancy measures how balanced all of the sets are. Nevertheless, the definition above, and everything I will say about discrepancy, extends to arbitrary real matrices ${A}$.

Hereditary discrepancy is a more robust variant of this definition: the hereditary discrepancy ${\mathsf{HerDisc}(A)}$ is simply the maximum of ${\mathsf{Disc}(B)}$, taken over all sub-matrices ${B}$ of ${A}$.

To start with, note that one can prove that ${\mathsf{Disc}(A) \leq D}$ by exhibiting a vector ${x}$ attaining such a bound, and this witness can be efficientl verified. Computing the discrepancy itself is NP-hard, even to approximate. In fact Charikar, Newman and Nikolov show that it is hard to distinguish between discrepancy zero and discrepancy ${\sqrt{n}}$. Thus one does not expect to find efficiently verifiable proofs of lower bounds on discrepancy.

Proving a good upper bound on hereditary discrepancy seems harder: how would I convince you that every sub-matrix ${B}$ has a good coloring? It turns out that ${\mathsf{HerDisc}}$ being ${1}$ is equivalent to ${A}$ being totally unimodular, and this can be efficiently checked. For higher ${\mathsf{HerDisc}}$ no such characterization is known, and in fact deciding whether the ${\mathsf{HerDisc}}$ is ${2}$ or ${3}$ can be shown to be NP hard. One would think (and some of us tried to prove) that herdisc would be at least as hard to approximate as discrepancy. However, one would be wrong. Herdisc is in fact approximable to a polylogarithmic factor. This result fell out of our attempts to understand how much noise differentially private mechanisms must add to a given set of linear queries. In this post, I will try to sketch how we get an approximation to ${\mathsf{HerDisc}}$. See the paper for more details (Since the primary focus of the paper is on questions in Differential Privacy, large parts of it do not pertain to the approximation to ${\mathsf{HerDisc}}$. If you are mainly interested in that approximation, I hope this post will help figure out which parts are relevant.)

The Partial Coloring technique

A very useful technique in discrepancy theory is that of partial coloring. Suppose that we can find a ${\{0,+1,-1\}}$ vector ${\chi}$ such that ${|A\chi|_{\infty}}$ is some value ${M}$ and ${\chi}$ has many non-zeroes, say ${|\chi|_{1} \geq \frac{n}{100}}$. Let’s for convenience call the least attainable such ${M}$ as the partial discrepancy of ${A}$, and let ${\mathsf{PHerDisc}(A)}$ denote the analogous hereditary version, i.e. the maximum partial discrepancy over all sub-matrices of ${A}$.

The partial coloring trick basically says that ${\mathsf{HerDisc}(A)\leq O(\log n)\cdot \mathsf{PHerDisc}(A)}$ for any ${A}$: we can find a good coloring by repeatedly finding good partial colorings. Each step reduces the number of uncolored columns by at least a constant factor, so that in ${O(\log n)}$ steps, we get a complete coloring.

${\mathsf{PHerDisc}}$ and Privacy

${\mathsf{PHerDisc}}$, it turns out, is closely related to differential privacy. Muthukrishnan and Nikolov, building on Dinur-Nissim, showed that the any minimally private algorithm to compute an approximation to ${Ax}$, fo r a private database ${x}$, must incur additive error at least as large as the hereditary discrepancy of ${A}$. Let me now explain this statement in a little more detail.

Imagine we have a database ${x}$ which contains private data of individuals. For a start, suppose that ${x}$ is a ${0}$-${1}$ vector, where the ${i}$th coordinate is a private bit of the ${i}$th individual (the results actually apply to a more general setting, but this version should suffice for this blog post). An analyst is interested in learning some set of ${d}$ aggregates, represented by a ${d\times n}$ matrix ${A}$, i.e. she wants to learn ${Ax}$. The analyst is also potentially an attacker, and may know something about the database, possibly from other auxiliary sources. Suppose that she knows some coordinates of ${x}$ already, and our privacy policy requires that we do not enable her to learn the unknown bits when we answer the query.

As you might be thinking, this would be impossible to guarantee if we respond with the correct answer ${Ax}$. So we will in fact give a noisy answer in such a way that privacy is guaranteed in a formal sense, and we try to give as accurate an answer as possible. Let us say we want all of our ${d}$ queries to be accurate within an expected additive error of ${E}$. How small can we make ${E}$ while avoiding a serious privacy violation?

Let us say that a serious privacy violation occurs if an adversary who knows some ${n-k}$ bits of ${x}$, after learning the noisy answer to ${Ax}$, can produce her guess ${x'}$ which differs from ${x}$ in ${k/50}$ bits. In other words, the adversary learns pretty much all of ${x}$. Suppose ${\mathsf{PHerDisc}(A)}$ is ${H}$. Then we will argue that ${E}$ must be at least ${H/4}$ to avoid a serious privacy violation.

Let ${B}$ be a sub-matrix of ${A}$ achieving partial discrepancy ${H}$, and let ${S}$ denote the indices of the columns of ${B}$. Consider an analyst that knows ${x}$ on ${S^c}$ and is trying to learn ${x}$ on ${S}$, and without loss of generality, suppose that ${x}$ is zero on ${S^c}$. Let ${y=Ax}$, and let ${y'}$ be the answer we give in an attempt to avoid a serious privacy violation. By definition of discrepancy, for any ${\chi \in \{0,+1,-1\}^n}$ supported on ${S}$ with ${|S|/100}$ non-zero entries, ${|A\chi|_\infty}$ is at least ${H}$. Suppose that ${E < H/4}$, and let ${x'}$ be any ${0}$-${1}$ vector such that ${|Ax' - y'|_\infty \leq 2E}$. By Markov\rq{}s inequality, ${x}$ satisfies this with probability half, in which case such an ${x\rq{}}$ exists. By triangle inequality, ${|Ax - Ax'|_{\infty} \leq 4E < H}$. Thus ${\chi=x-x'}$ is a vector in ${\{0,+1,-1\}^n}$ satisfying ${|A\chi| < H}$. It must therefore be the case that ${\chi}$ has fewer than ${|S|/100}$ non-zeroes, so that ${x'}$ agrees with ${x}$ on all but ${|S|/100}$ coordinates. This of course is a serious privacy violation, establishing that any minimally private mechanism must add error ${\mathsf{PHerDisc}(A)}$ per entry. We have thus established that error of any private mechanism for ${A}$ is at least ${\mathsf{PHerDisc}(A)}$.

Approximating ${\mathsf{PHerDisc}}$

I will next try to argue how to get from this a polylogarithmic approximation algorithm for ${\mathsf{PHerDisc}}$, and hence for ${\mathsf{HerDisc}}$. Note that an ${\alpha}$ approximation algorithm, given a matrix ${A}$ must for some ${H}$ produce two certificates: one certifying that ${\mathsf{PHerDisc}}$ is at least ${H/\alpha}$, and another certifying that it is at most ${H}$. For example, the first certificate may consist of a sub-matrix ${B}$ and a dual solution for an SDP relaxation for the discrepancy of ${B}$. A simpler witness can be constructed based on a result of Lovász, Spencer and Vesztergombi, who gave a beautiful geometric argument showing that the determinant of any square sub-matrix of ${A}$, appropriately normalized, gives a lower bound on ${\mathsf{HerDisc}(A)}$. Thus a square sub-matrix ${B}$ such that the resulting lower bound ${\mathsf{detLB}(B)}$ is large could form a good witness. Matousek recently showed that there always exists such a witness that is nearly optimal, but left open the question of finding it.

The second certificate seems less obvious: how would I convince you that for every sub-matrix ${B}$, there is a good coloring? The above privacy argument comes to the rescue here. Suppose my witness consisted of the description of a random variable ${E}$ with the following properties:

• The mechanism that outputs ${Ax+ E}$ satisfies a reasonable privacy definition, that rules out a serious privacy violation.
• The expectation of the ${\ell_{\infty}}$ norm of ${E}$ is at most some value ${P}$.

Since a serious privacy violation is ruled out, the expected error must be at least ${\mathsf{PHerDisc}(A)}$, and thus we can conclude that ${\mathsf{PHerDisc}}$ is at most ${P}$.

In our case, the reasonable privacy definition will be ${(\epsilon,\delta)}$-differential privacy. Regular readers of this blog are no doubt familiar with the definition, and for this post, it will suffice that this definition guards against serious privacy violation (and much more). We show that for any ${A}$, we can find a sub-matrix ${B}$ and a random variable ${E}$ which satisfies the above two properties, with ${P}$ bounded by polylog(d) times ${\mathsf{detLB}(B)}$. Thus we get a mechanism that is within ${polylog(d)}$ of the optimal mechanism, and additionally certifies a near-tight upper bound on the hereditary discrepancy.

How do we construct such a mechanism? That is a slightly longer story and I will only sketch the main ideas. First, we move from ${\ell_{\infty}}$ to ${\ell_2^2}$ at a polylogarithmic loss: intuitively, ${\ell_2^2}$ measures an average, and ${\ell_{\infty}}$ measures the max. The minimax theorem or Boosting or Multiplicative weights technique ensures that an algorithm for minimizing the average can be translated to one that minimizes the maximum. This kind of reduction is relatively standard and I will skip it here.

How do we find an ${(\epsilon,\delta)}$-differentially private mechanism minimizing the ${\ell_2^2}$ error? It turns out that if the columns of ${A}$ (i.e. the vectors ${Ae_i}$) all lie in a ball of radius ${R}$ around the origin, then it is enough to add Gaussian noise of magnitude ${cR}$ in each coordinate of ${Ax}$, where ${c = c(\epsilon,\delta)}$ is a constant depending on ${\epsilon}$ and ${\delta}$ alone. This result in fact predates the definition of differential privacy: intuitively, one person entering or leaving the database can cause a perturbation of size ${Ae_i}$ in the value of ${Ax}$, and thus adding Gaussian noise to hide such a perturbation is sufficient to guarantee privacy.

There is of course no reason for this mechanism to be anywhere near optimal, and in general it is not. However, it turns out that if this ball of radius ${R}$ happens to be the Minimum volume Ellipsoid that contains all the points ${\pm Ae_i}$ (also known as the John-Loewner Ellipsoid of this set of points), then we have a lot more structure to work with. A result of John, combined with the restricted invertibility theorem of Bourgain and Tzafriri guarantees that in this case, one can find ${\frac{d}{2}}$ of the columns of ${A}$ that have norm ${R}$ and are nearly orthogonal. Armed with these, and using the rotational invariance of the ${\ell_2}$ version of discrepancy, we can easily come up with a sub-matrix ${B}$ that certifies that the simple Gaussian mechanism is nearly optimal. Intuitively, the nearly orthogonal columns define a simplex with large volume, which gives the determinant lower bound.

Of course in general, we will not be lucky enough to get a ball as the John-Loewner Ellipsoid. Suppose we get ${E}$ as the John-Loewner Ellipsoid. Let ${U}$ be the subspace spanned by the ${\frac{d}{2}}$ longest axes of ${E}$, and let ${V}$ be the complement subspace. Further let ${R}$ be the length of the ${\frac{d}{2}}$th longest axis. It turns out that a generalization of Bourgain-Tzafriri due to Vershynin can still be used to find a good ${B}$ in ${P_UA}$, where ${P_U}$ is the projection matrix for ${U}$. Moreover, since ${P_VA}$ is contained in a ball of radius ${R}$, we can define a mechanism for the projection ${P_VAx}$ that has error within a constant factor of the determinant lower bound that we get from ${B}$. This gives us some progress towards outputting ${Ax}$. We next recursively find a mechanism for ${P_UAx}$. Since we halve the dimension in each iteration, we have at most ${O(\log d)}$ iterations. We can use standard privacy machinery to ensure that the overall mechanism satisfies differential privacy. Since each step of the recursion has error within a constant factor of a lower bound output in that step, our overall ${\ell_2^2}$ error is within a polylog factor of one of the lower bounds. This completes the rough sketch of the mechanism. See the paper for full details.

Thus we now know that hereditary discrepancy is approximable to a polylog factor, and not approximable better than a factor of 1.5. Closing, or even narrowing this gap is an interesting open problem.

Many thanks to Alex Nikolov and Li Zhang for his contribution to the content and the writing of this post.

Edit 2/13: Edited to fix the missing link to the paper.