The Longest Streak — Estimating Crowd Size from Coin Flips

Counting the number of unique items in a massive dataset sounds straightforward — just keep a list of everything you’ve seen and don’t double-count. But that list can get huge. If you’re tracking a billion unique user IDs, each 64 bytes long, you need roughly 64 GB of RAM just for the set.

There’s a better way. An algorithm called HyperLogLog can count billions of distinct items using roughly 1.5 kilobytes of memory, with about 2% error. Personally, I’ve always found it to be one of the most beautiful tricks in computer science, and it starts with a coin-flipping game.

Coin Flips Are Binary

Coin flips are binary. Every coin flip is a bit:

Tails = 0
Heads = 1

So a sequence of T, T, T, H is the binary string 0001. And “how many tails before the first heads” is exactly “how many leading zeros before the first 1.”

Try it yourself:

The coin-flipping game is the same as generating a random binary number and counting its leading zeros.

The Coin-Flipping Game

Imagine a crowd of people — could be a hundred, could be a million, we don’t know yet. Each person plays a simple game:

Flip a fair coin repeatedly until you get heads
Count how many tails you got in a row before that first heads
Report your number

In other words, if you flip a coin and immediately get heads, you report 0. If you flip a coin and get tails, you flip again and report how many tails you got in a row before the first heads.

If only a few people play, the longest streak anyone reports will probably be short — 0, 1, maybe 2 tails in a row. But if a million people play, someone is almost certainly going to get an impressively long run. 10 tails in a row. Maybe 15. Maybe more.

The probability of getting exactly k tails before the first heads is (1/2)^k+1. Five tails in a row? That’s a 1-in-32 shot. With about 32 people playing, you’d expect roughly one of them to pull it off. Ten tails in a row? One-in-1,024 — you’d need about a thousand people before you’d expect to see it.

The key insight: if the longest streak we observe is L, then a rough estimate of the crowd size is 2^L.

Longest streak (L)	Probability of L tails in a row	Estimated crowd size (2^L)
0	1 in 1	1
1	1 in 2	2
2	1 in 4	4
3	1 in 8	8
4	1 in 16	16
5	1 in 32	32
6	1 in 64	64
7	1 in 128	128
8	1 in 256	256
9	1 in 512	512
10	1 in 1,024	1,024

If someone reports a streak of 7, that’s a 1-in-128 long shot — so there were probably around 128 people playing. A streak of 10? About a thousand.

Try it yourself — press Play to let a crowd flip coins. The ★ marks the record holder:

Why This Matters for Real Data

We need every item in our dataset (user IDs, URLs, search queries) to have its own random-looking coin-flip sequence. That’s exactly what a hash function does: feed in any item, get out a uniformly random binary string.

Crucially, the same item always produces the same hash — so duplicates don’t create new “players.” Each unique item is like one person who walked up, flipped coins, and reported their streak.

But how do hash functions actually produce random-looking bits? And why does this work regardless of what your data looks like — whether it’s sequential integers, random UUIDs, or names? We explore that in Part 2: Hashing — Turning Anything into Random Coin Flips.

For now, the key point is this: the question “how many unique items?” becomes “how many people are playing?” And we already know how to estimate that: look at the longest streak.

The Single-Max Estimator

This gives us a dead-simple algorithm:

For each item, compute its hash
Count the leading zeros in the hash
Track the longest leading-zero run seen so far
Estimate = 2^max

This is essentially the algorithm Flajolet and Martin proposed in 1985. It uses almost no memory — just a single number.

But there’s a problem. And it’s a big one.

The Instability Problem

How bad is the single-max estimator in practice? To find out, we need to understand what it means to run the experiment more than once.

Each “trial” below represents a completely independent experiment: the same 1,000 unique items, but hashed with a different hash function each time. Different hash → different coin-flip sequences for everyone → different longest streak → different estimate. It’s as if the same crowd played the game over and over, with fresh coins each time.

The point of running many trials isn’t something you’d actually do — in practice, you pick one hash function and get one answer. We run many trials here to see the distribution: the range of answers you could get from a single use of this estimator. Think of it as asking: “if I use this algorithm once, how far off might I be?”

This is the fundamental problem: you get one shot. You pick a hash function, process your data, and get back a single power-of-2 estimate. You can’t tell whether you drew a lucky trial or an unlucky one. And as you can see above, the difference between the two can be enormous.

We need a way to get multiple independent estimates from a single hash function. That’s what Part 3 is about — and the trick is surprisingly elegant.

And if you want to build HLL yourself from scratch, that’s Part 4.

Next: Hashing — Turning Anything into Random Coin Flips