A food delivery app uses Redis to match customers with nearby couriers. The architecture relies on `GEOADD couriers:online ` and `GEOSEARCH` to find matches within a 5km radius. When a courier goes offline, what is the correct way to remove them from the spatial index?

`ZREM couriers:online ` — Redis Geospatial commands do not use a distinct internal data type; they are built entirely on top of Sorted Sets (ZSets). The `GEOADD` command encodes the longitude and latitude into a 52-bit geohash integer and stores it as the member's score in the Sorted Set. Because it is fundamentally a Sorted Set, there is no specialized `GEODEL` command. You use standard Sorted Set commands, specifically `ZREM`, to remove members from the index.

Module F-6·20 min read

Probabilistic cardinality estimation with PFADD/PFCOUNT at constant 12KB, per-user boolean tracking at scale with SETBIT/BITCOUNT, and location-aware radius queries with GEOADD/GEOSEARCH.

F-5 — HyperLogLog, Bitmaps, and Geospatial

Q: A media site needs to display the number of unique daily readers for a trending article. At peak, the article might receive millions of distinct visitors. The exact number is not critical; an estimate within 1% is perfectly acceptable. Which Redis data structure is designed to solve this with minimal memory footprint, and roughly how much memory will it consume?

HyperLogLog, consuming a maximum of roughly 12KB of memory regardless of whether there are 100 or 100 million visitors. — HyperLogLog (HLL) is a probabilistic data structure explicitly designed for estimating the cardinality (number of unique elements) of massive datasets. By accepting a tiny margin of error (standard error of 0.81%), it avoids storing the actual elements. Instead, it maintains a compact mathematical sketch. This allows it to estimate unique counts for billions of items while never exceeding ~12KB of RAM. Using `PFADD` and `PFCOUNT` solves the "unique visitors" problem at scale perfectly.

Q: An engineering team wants to track whether millions of users logged into their app on a specific day using Bitmaps (`SETBIT active_users:2025-01-01 1`). They notice that for some markets, this approach uses almost no memory, but for a newly acquired subsidiary, the memory usage spikes to over 100MB per day even though only a few thousand users logged in. What is the most likely cause?

The subsidiary's user IDs are sequentially assigned but start at a very high number (e.g., 800,000,000). Redis must allocate enough memory to cover the maximum offset requested. — A Redis Bitmap is just a standard string manipulated at the bit level. When you call `SETBIT key offset 1`, Redis must ensure the underlying byte array is large enough to contain that offset. If a user has an ID of 800,000,000, Redis immediately allocates ~100MB of memory for that single key, even if it's the only bit set to 1 in the entire string. Bitmaps are only memory-efficient when the offsets (user IDs) are dense, relatively small integers.

Who this module is for: You know Redis has data types beyond the standard five, but you have never understood when to reach for them or what problems they actually solve. This module covers three specialized types that each solve a specific class of problem with dramatically less memory than naive approaches. HyperLogLog for approximate unique counting, Bitmaps for per-user boolean tracking at scale, and Geospatial for location-aware queries.

HyperLogLog

The Problem: Counting Unique Items at Scale

Counting unique visitors to a page sounds simple. Store a Set, add each user ID, read the cardinality with SCARD. Done.

Until your page gets 10 million unique visitors per day. A Redis Set with 10 million string members consumes roughly 500MB of RAM — and you need a separate Set per page, per day.

If you only need to know approximately how many unique visitors there were (and you do not need to know which visitors), HyperLogLog solves this in under 12KB of RAM regardless of cardinality.

What HyperLogLog Is

HyperLogLog (HLL) is a probabilistic algorithm for cardinality estimation. It does not store the actual elements — it stores a compact sketch of the data that can estimate the number of distinct elements with a standard error of 0.81%.

That means: if the true unique count is 1,000,000, HyperLogLog will return a value between ~993,000 and ~1,007,000. For analytics dashboards, A/B test reporting, and "approximately X unique users" displays, this error margin is completely acceptable.

Memory: Each HLL structure uses at most 12KB regardless of whether it has seen 100 or 100 billion distinct values.

Commands

text

PFADD  key element [element ...]   → add elements to the HLL, return 1 if internal state changed
PFCOUNT key [key ...]              → return estimated cardinality (of union if multiple keys)
PFMERGE destkey sourcekey [...]    → merge multiple HLLs into destkey

text

127.0.0.1:6379> PFADD visitors:2024-01-15 "user:1001" "user:1002" "user:1003"
(integer) 1
127.0.0.1:6379> PFADD visitors:2024-01-15 "user:1001"   ← duplicate
(integer) 0                                               ← no state change
127.0.0.1:6379> PFCOUNT visitors:2024-01-15
(integer) 3

# Unique visitors across two days (union)
127.0.0.1:6379> PFADD visitors:2024-01-16 "user:1002" "user:1004"
(integer) 1
127.0.0.1:6379> PFCOUNT visitors:2024-01-15 visitors:2024-01-16
(integer) 4   ← user:1002 counted once

PFMERGE combines HLLs for multi-period rollups:

text

PFMERGE visitors:2024-01-week1 visitors:2024-01-11 visitors:2024-01-12 visitors:2024-01-13 visitors:2024-01-14 visitors:2024-01-15
PFCOUNT visitors:2024-01-week1   → unique visitors across the week

When to Use HyperLogLog

Unique page views per day/hour
Unique search queries
Unique API callers per endpoint
Unique items processed by a pipeline (dedup tracking at scale)
Any "count distinct" metric where ~1% error is acceptable

Do not use HyperLogLog when:

You need exact counts (use a Set)
You need to know which specific items were seen (use a Set)
Cardinality is small (< 1,000 items) — just use a Set; the memory difference is negligible

Bitmaps

The Problem: Tracking Boolean States per User at Scale

Suppose you want to track daily login streaks. For each user, you need to know: did they log in on day 1? Day 2? Day 365?

Naive approach: one key per user per day. SET login:user:1001:2024-01-15 1. That is 365 keys per user per year. At 1 million users, that is 365 million keys.

Bitmap approach: one key per user per year. Each bit position represents a day. Bit 0 = Jan 1, bit 1 = Jan 2, ..., bit 364 = Dec 31.

At 1 million users and 365 days: 1,000,000 × 365 bits = ~45MB total. The naive approach would consume gigabytes.

What Bitmaps Are

Redis does not have a distinct "Bitmap" type. Bitmaps are a set of bit-manipulation operations on Redis Strings. A String in Redis is a byte array, and Redis lets you address individual bits within that array by offset.

A String of 1 byte can hold 8 bits. A String of 512MB can hold ~4 billion bits.

Commands

text

SETBIT key offset value      → set bit at offset to 0 or 1, return old value
GETBIT key offset            → return the bit at offset (0 if key or bit does not exist)
BITCOUNT key [start end]     → count set bits (optionally in byte range)
BITPOS key bit [start end]   → find first bit set to 0 or 1
BITOP AND|OR|XOR|NOT destkey key [key ...]   → bitwise operation on multiple keys

Tracking Daily Logins

text

# User 1001 logs in on day 14 of the year (0-indexed)
SETBIT logins:user:1001:2024 14 1
(integer) 0   ← previous value was 0

# User 1001 logs in on day 15
SETBIT logins:user:1001:2024 15 1

# How many days did user 1001 log in this year?
BITCOUNT logins:user:1001:2024
(integer) 2

# Did user 1001 log in on day 14?
GETBIT logins:user:1001:2024 14
(integer) 1

# First day user 1001 logged in
BITPOS logins:user:1001:2024 1
(integer) 14

Population-Wide Daily Active Users

Instead of per-user keys, use per-day keys where the offset is the user ID:

text

# All users who logged in on 2024-01-15
SETBIT active:2024-01-15 1001 1   ← user 1001
SETBIT active:2024-01-15 1002 1   ← user 1002
SETBIT active:2024-01-15 5000 1   ← user 5000

# Count DAU
BITCOUNT active:2024-01-15
(integer) 3

# Users active on both Jan 15 AND Jan 16 (intersection)
BITOP AND active:both:2024-01-15-16 active:2024-01-15 active:2024-01-16
BITCOUNT active:both:2024-01-15-16

# Users active on Jan 15 OR Jan 16 (union)
BITOP OR active:any:2024-01-15-16 active:2024-01-15 active:2024-01-16
BITCOUNT active:any:2024-01-15-16

Memory note: A bitmap for 1 million users requires 1,000,000 bits = 125KB. Even with 100 million users, a single DAU bitmap is 12.5MB — manageable for per-day tracking.

Caveat: Bit offsets are allocated based on the maximum offset ever set. If user ID 99,999,999 logs in, Redis allocates a ~12MB string immediately, even if only that one user exists. Numeric IDs must be dense (not UUIDs) for Bitmaps to be memory-efficient.

Feature Flags and A/B Testing

text

# Enable feature "dark-mode" for user 42
SETBIT feature:dark-mode 42 1

# Check if user 42 has the feature
GETBIT feature:dark-mode 42

At millions of users, this is significantly more memory-efficient than a Set of user IDs with the feature enabled.

When to Use Bitmaps

Daily/weekly active user tracking (compact per-day bitfield)
Login streaks and retention cohorts
Feature flag rollouts (user IDs as offsets)
Read receipts and notification tracking at scale
Any boolean-per-user-per-day pattern

Do not use Bitmaps when:

User IDs are not dense integers (UUIDs → wasteful; use a Set or Bloom filter)
The per-item state has more than 2 values (use a Hash or sorted set per user)
You need to iterate over the set of users with a flag enabled (use a Set; BITPOS finds the first but iterating all is awkward)

Geospatial

The Problem: Location-Based Queries

Storing and querying user locations — "find all drivers within 5km of this point" — requires either a spatial index or clever approximation. Redis's Geo commands give you a spatial index backed by a Sorted Set.

What Geospatial Is

Redis Geo is not a separate type — it is a set of commands that operate on a Sorted Set where the score is a 52-bit Geohash encoding of the longitude/latitude. This means every GEO* command is ultimately stored and retrieved via the Sorted Set's skip list, and you can use ZRANGE on a Geo key if needed.

The encoding achieves roughly ±0.0001° precision (about ±0.6 metres at the equator — sufficient for any real-world use case).

Commands

text

GEOADD  key [NX|XX] [CH] lng lat member [lng lat member ...]
         → add one or more locations

GEODIST key member1 member2 [m|km|mi|ft]
         → distance between two members

GEOPOS  key member [member ...]
         → return (longitude, latitude) for members

GEOSEARCH key FROMMEMBER member | FROMLONLAT lng lat
          BYRADIUS radius m|km|mi|ft | BYBOX width height m|km|mi|ft
          [ASC|DESC] [COUNT count [ANY]] [WITHCOORD] [WITHDIST] [WITHSCORE]
         → find members within radius or bounding box (Redis 6.2+)

GEOSEARCHSTORE destkey ...
         → like GEOSEARCH but stores results in a Sorted Set

The older GEORADIUS and GEORADIUSBYMEMBER commands are deprecated in Redis 6.2 in favour of GEOSEARCH. Use GEOSEARCH.

Building a Nearby Drivers Feature

text

# Add drivers to the index
GEOADD drivers:online 77.5946 12.9716 "driver:101"   ← Bangalore
GEOADD drivers:online 77.5950 12.9720 "driver:102"
GEOADD drivers:online 80.2707 13.0827 "driver:103"   ← Chennai (far away)

# Find drivers within 2km of a rider's location
GEOSEARCH drivers:online FROMLONLAT 77.5947 12.9715 BYRADIUS 2 km ASC WITHCOORD WITHDIST COUNT 10
1) 1) "driver:101"
   2) "0.0127"      ← distance in km
   3) 1) "77.59460"
      2) "12.97160"
2) 1) "driver:102"
   2) "0.0523"
   3) 1) "77.59500"
      2) "12.97200"
# driver:103 is not returned (too far)

# Distance between two drivers
GEODIST drivers:online "driver:101" "driver:102" km
"0.0529"

# Get coordinates of a specific driver
GEOPOS drivers:online "driver:101"
1) 1) "77.59460"
   2) "12.97160"

Updating a Driver's Location

Since Geo uses a Sorted Set internally, updating a member's position is the same as re-adding it:

text

# Driver moves — just GEOADD again with new coordinates
GEOADD drivers:online 77.5948 12.9718 "driver:101"

GEOADD with an existing member updates the score (position). No need to delete first.

Removing a Member

ZREM drivers:online "driver:101"   ← removes the member from the underlying Sorted Set

Or use GEOADD ... XX to only update existing members (won't add if missing) or NX to only add new ones (won't update if existing).

Bounding Box Search

BYBOX is useful for map viewport queries:

text

# Find all restaurants within a 10km × 10km box centered at a point
GEOSEARCH restaurants FROMLONLAT 77.5946 12.9716 BYBOX 10 10 km ASC COUNT 50

Precision and Limitations

The Geohash encoding has a precision of ~0.6m. For most location-aware applications (ride-hailing, delivery, nearby businesses), this is more than sufficient. For sub-metre precision (surveying, GPS coordinates with cm accuracy), use a dedicated spatial database.

Geo keys are plain Sorted Sets. This means:

ZCARD drivers:online tells you how many drivers are in the index
ZSCAN iterates over all drivers
There is no built-in TTL per member — to remove a driver who goes offline, call ZREM
For dynamic driver locations (updating every few seconds), consider whether the index size stays manageable

When to Use Geospatial

Ride-hailing / delivery — nearby driver search
Restaurant/store finder — "near me" queries
Delivery radius validation — is this address within our delivery zone?
Location-aware notifications — alert users near an event
Any "find X within Y distance of Z" query

Do not use Redis Geo when:

You need complex spatial queries (polygon containment, routing, map matching) — use PostGIS
Precision below 1 metre is required
You need to store and query trajectory data over time (use a time-series + PostGIS)

Comparing the Three Types

Type	Problem Solved	Memory	Accuracy	Use Case
HyperLogLog	Count distinct elements	≤ 12KB always	~99.2% (0.81% error)	DAU, unique visitors, distinct queries
Bitmap	Boolean per-user tracking	1 bit per user	Exact	Login streaks, DAU, feature flags
Geospatial	Location + radius queries	~88 bytes per member	±0.6m	Nearby search, delivery, ride-hailing

The common thread: each type trades something (exactness, raw flexibility, or simplicity) for a massive reduction in memory or query complexity at scale. Knowing when to make that trade is the mark of a Redis practitioner.

Summary

HyperLogLog — PFADD / PFCOUNT / PFMERGE. Estimates cardinality with ±0.81% error in ≤ 12KB. Use for unique visitor counts, distinct query tracking.
Bitmaps — SETBIT / GETBIT / BITCOUNT / BITOP. Boolean state per dense integer ID, 1 bit per user. Use for login streaks, DAU, feature flags.
Geospatial — GEOADD / GEOSEARCH / GEODIST. Location index backed by a Sorted Set, ~0.6m precision. Use for nearby search, delivery radius, ride-hailing.
All three solve problems that would require dramatically more memory with naive Set or String approaches.

Next: F-6 — Pipelining and the RESP Protocol — how to batch commands to eliminate network round-trips, and why naive await redis.get() in a loop is one of the most common Redis performance mistakes.

Knowledge Check

A media site needs to display the number of unique daily readers for a trending article. At peak, the article might receive millions of distinct visitors. The exact number is not critical; an estimate within 1% is perfectly acceptable. Which Redis data structure is designed to solve this with minimal memory footprint, and roughly how much memory will it consume?

An engineering team wants to track whether millions of users logged into their app on a specific day using Bitmaps (SETBIT active_users:2025-01-01 <user_id> 1). They notice that for some markets, this approach uses almost no memory, but for a newly acquired subsidiary, the memory usage spikes to over 100MB per day even though only a few thousand users logged in. What is the most likely cause?

A food delivery app uses Redis to match customers with nearby couriers. The architecture relies on GEOADD couriers:online <lon> <lat> <courier_id> and GEOSEARCH to find matches within a 5km radius. When a courier goes offline, what is the correct way to remove them from the spatial index?

Test your knowledge with more question sets

PreviousModule F-5: TTL, Expiry, and Eviction Next Module F-7: Pipelining and the RESP Protocol

Discussion

Join the discussion

Loading comments...