“Balls and Bins?”, you ask, “Is there anything left to prove there?” Surprisingly, there are really natural questions that are open. Today I want to talk about one such question.
First a quick primer. Balls and Bins processes model randomized allocations processes, used in hashing or more general load balancing schemes. Suppose that I have balls (think items) to be thrown into bins (think hash buckets). I want a simple process that will keep the loads balanced, while allowing quick decentralized lookup.
The simplest randomized process one can think of is what’s called the one choice process: for each ball, we independently and uniformly-at-random pick one of the bins and place it there.
We measure the balance in terms of the additive gap: the difference between the maximum load and the average load. Many of us studied the case in a randomized algorithms class, and we know that the gap is except with negligible probability (for the rest of the post, I will skip this “except with negligible probability” qualifier). What happens when is much larger than ? It can be shown that for large enough , the gap is (see this paper for more precise behaviour of the gap).
Can we do better? Azar, Broder, Karlin and Upfal analyzed the following two choice process: balls come sequentially, and each ball picks two bins independently and uniformly at random (say, with replacement), and goes into the less loaded of the two bins (with ties broken arbitrarily). They showed that the gap of the two choice process when is significantly better: only . What about the case when is much larger than ? Berenbrink et al. showed that the gap stays at , for arbitrary .
While the decrease from to is surprising enough, I find the latter distinction much more striking. When is a large polynomial in , the gap for one choice scheme keeps going up, while that in two choice case remains put. For some quick intuition on why this happens, pause for a minute to think about the case (A hint is in the comments). Then read on.
So now that we understand one and two choice, let’s move on (or rather, dig in between). Yuval Peres, Udi Wieder and I analyzed the -choice process, where we place a ball in a uniformly random bin with probability , and in the lesser loaded to two random bins with probability . The intuition would suggest that this slight bias towards balance would keep the gap from growing with , and this is indeed what we show: for bounded away from , the gap is .
Analyzing the -choice process helps us understand another natural process. Given a regular graph on , one can define the balls and bins process on as follows: balls come sequentially, and each ball picks an edge of at random, and goes to the (bin corrseponding to the) lesser loaded endpoint. Thus when is made up of self loops, we get the one choice process. When is the complete graph (with self loops), we get the two choice process. Using the -choice analysis, it can be shown that whenever is connected, the gap is independent of . And when is an expander, the gap is .
And that brings us to a simple open question: the case of being a cycle. To restate the question: bins on a cycle. Repeatedly pick two adjacent bins, and put a ball in the lesser loaded of the two. How large does the gap get?
We can show the gap is and . Empirically neither of these bounds seems remotely tight. Can you improve them?