In this post, I will present what I think is a simple(r) algorithm for estimating the frequency moment, or simply the
norm of a vector, in the sketching/streaming model. In fact, the algorithm is just a weak embedding of
-dimensional
into
of dimension
(this viewpoint spares me from describing the streaming model precisely).
What do I mean by a weak embedding? We will get a randomized linear map such that, for any
, we have that the image
is a constant approximation to
with some 90% probability. Since
, the embedding is actually dimensionality-reducing.
Before I jump into the algorithm, let me mention that the algorithm is essentially based on the approach from a paper joint with Robi Krauthgamer and Krzysztof Onak, and also the paper by Hossein Jowhari, Mert Saglam, and Gabor Tardos. At least our paper was itself inspired by the first (near-)optimal algorithm for frequency moment estimation of Piotr Indyk and David Woodruff (later improved by Lakshminath Bhuvanagiri, Sumit Ganguly, Deepanjan Kesh, and Chandan Saha).
The algorithm. Let be the input vector of dimension
. The algorithm just multiplies
entry-wise by some scalars, and then folds the vector into a smaller dimension
using standard hashing. Formally, in step one, we compute
as
where random variable is drawn from an exponential distribution
. In step two, we compute
from
using a random hash function
as follows:
where are just random
.
is the output, that’s it. In matrix notation,
, where
is a diagonal matrix and
is a sparse
“projection” matrix describing the hash function
.
Of course the fun part is to show that this works (simple algorithm is not necessarily simple analysis). I’ll give essentially the entire analysis below, which shouldn’t be bad.
Analysis. We’ll first see that the infinity norm of is correct, namely approximates
, and then that the infinity norm of
is correct as well (both with constant approximation with constant probability of success). In other words, step one is an embedding into
, and step two is dimensionality reduction.
The claim about the infinity norm of follows from the “stability” property of the exponential distribution: if
are exponentially distributed, and
are scalars, then
is distributed as
where
is also an exponential and
.
Now, applying this stability property for we get that
is distributed as
.
Note that we already obtain a weak embedding of into
(i.e., no dimensionality reduction). We proceed to show that the dimension-reducing projection does not hurt.
We will now analyze the max-norm of . The main idea is that the biggest entries of
will go into separate buckets, while the rest of the “stuff” (small entries) will give a minor contribution only to each bucket. Hence, the biggest entry of
will stick out in
as well (and nothing bigger will stick out), preserving the max-norm. For simplicity of notation, let
, and note that the largest entry of
is about
(as we argued above).
What is big? Let’s say that “big” is an entry of such that
. With good enough probability, there are only at most
such big items (because of exponential distribution), so they will all go into different buckets.
Now let’s look at the extra “stuff”, the contribution of the small entries. Let be the set of small entries. Fix some bucket index
. We would like to show that the contribution of entries from
that fall into bucket
is small, namely, less than
(the max entry of
).
Let’s look at . A straight-forward calculation shows that the expectation of
is zero, and the variance is
. If only we now could show that
, we would be done by Chebyshev’s inequality…
How do we relate to
? Here comes the exponential distribution at rescue again. Note that
since the expectation
for an exponentially distributed
is constant. Together with standard inter-norm inequality that
, we have that
.
Combining, we have that , which is enough to apply Chebyshev’s and conclude that the “extra stuff” in bucket
is
with constant probability.
We are almost done except for one issue: we would need the above for all buckets , and, in particular, we would like to have
for a fixed
with high probability (not just constant). To achieve this, we use a stronger concentration inequality, for example the Bernstein inequality, which allows us exploit the fact that
for
to achieve a high-probability statement.
Discussion. The use of “stability” of exponential distribution is similar to Piotr Indyk’s use of -stable distributions for sketching/streaming
,
, norms. Note that
-stable distributions do not exist for
; so here the notion of “stability” is slightly different. In the former, one uses the fact that for random variables
, which are
-stable, we have that
is also
-stable with well-controlled variance. In the latter case, we use the property that the max of several “stable” distributions is another one:
is distributed as
(i.e.,
is “max-stable”). Note that this is useful for embedding into
. As it turns out, such a transformation does not increase the
norm of the vector much, allowing us to do the dimensionality reduction in
.
For streaming applications, the important aspects are the small final dimension and the linearity of
. This dimension of
is optimal for this algorithm.
UPDATE: a preliminary write-up with formal details is available here.