Probabilistic cardinality estimation with PFADD/PFCOUNT at constant 12KB, per-user boolean tracking at scale with SETBIT/BITCOUNT, and location-aware radius queries with GEOADD/GEOSEARCH.
F-5 — HyperLogLog, Bitmaps, and Geospatial
Who this module is for: You know Redis has data types beyond the standard five, but you have never understood when to reach for them or what problems they actually solve. This module covers three specialized types that each solve a specific class of problem with dramatically less memory than naive approaches. HyperLogLog for approximate unique counting, Bitmaps for per-user boolean tracking at scale, and Geospatial for location-aware queries.
HyperLogLog
The Problem: Counting Unique Items at Scale
Counting unique visitors to a page sounds simple. Store a Set, add each user ID, read the cardinality with SCARD. Done.
Until your page gets 10 million unique visitors per day. A Redis Set with 10 million string members consumes roughly 500MB of RAM — and you need a separate Set per page, per day.
If you only need to know approximately how many unique visitors there were (and you do not need to know which visitors), HyperLogLog solves this in under 12KB of RAM regardless of cardinality.
What HyperLogLog Is
HyperLogLog (HLL) is a probabilistic algorithm for cardinality estimation. It does not store the actual elements — it stores a compact sketch of the data that can estimate the number of distinct elements with a standard error of 0.81%.
That means: if the true unique count is 1,000,000, HyperLogLog will return a value between ~993,000 and ~1,007,000. For analytics dashboards, A/B test reporting, and "approximately X unique users" displays, this error margin is completely acceptable.
Memory: Each HLL structure uses at most 12KB regardless of whether it has seen 100 or 100 billion distinct values.
Commands
PFADD key element [element ...] → add elements to the HLL, return 1 if internal state changed
PFCOUNT key [key ...] → return estimated cardinality (of union if multiple keys)
PFMERGE destkey sourcekey [...] → merge multiple HLLs into destkey
127.0.0.1:6379> PFADD visitors:2024-01-15 "user:1001" "user:1002" "user:1003"
(integer) 1
127.0.0.1:6379> PFADD visitors:2024-01-15 "user:1001" ← duplicate
(integer) 0 ← no state change
127.0.0.1:6379> PFCOUNT visitors:2024-01-15
(integer) 3
# Unique visitors across two days (union)
127.0.0.1:6379> PFADD visitors:2024-01-16 "user:1002" "user:1004"
(integer) 1
127.0.0.1:6379> PFCOUNT visitors:2024-01-15 visitors:2024-01-16
(integer) 4 ← user:1002 counted once
PFMERGE combines HLLs for multi-period rollups:
PFMERGE visitors:2024-01-week1 visitors:2024-01-11 visitors:2024-01-12 visitors:2024-01-13 visitors:2024-01-14 visitors:2024-01-15
PFCOUNT visitors:2024-01-week1 → unique visitors across the week
When to Use HyperLogLog
- Unique page views per day/hour
- Unique search queries
- Unique API callers per endpoint
- Unique items processed by a pipeline (dedup tracking at scale)
- Any "count distinct" metric where ~1% error is acceptable
Do not use HyperLogLog when:
- You need exact counts (use a Set)
- You need to know which specific items were seen (use a Set)
- Cardinality is small (< 1,000 items) — just use a Set; the memory difference is negligible
Bitmaps
The Problem: Tracking Boolean States per User at Scale
Suppose you want to track daily login streaks. For each user, you need to know: did they log in on day 1? Day 2? Day 365?
Naive approach: one key per user per day. SET login:user:1001:2024-01-15 1. That is 365 keys per user per year. At 1 million users, that is 365 million keys.
Bitmap approach: one key per user per year. Each bit position represents a day. Bit 0 = Jan 1, bit 1 = Jan 2, ..., bit 364 = Dec 31.
At 1 million users and 365 days: 1,000,000 × 365 bits = ~45MB total. The naive approach would consume gigabytes.
What Bitmaps Are
Redis does not have a distinct "Bitmap" type. Bitmaps are a set of bit-manipulation operations on Redis Strings. A String in Redis is a byte array, and Redis lets you address individual bits within that array by offset.
A String of 1 byte can hold 8 bits. A String of 512MB can hold ~4 billion bits.
Commands
SETBIT key offset value → set bit at offset to 0 or 1, return old value
GETBIT key offset → return the bit at offset (0 if key or bit does not exist)
BITCOUNT key [start end] → count set bits (optionally in byte range)
BITPOS key bit [start end] → find first bit set to 0 or 1
BITOP AND|OR|XOR|NOT destkey key [key ...] → bitwise operation on multiple keys
Tracking Daily Logins
# User 1001 logs in on day 14 of the year (0-indexed)
SETBIT logins:user:1001:2024 14 1
(integer) 0 ← previous value was 0
# User 1001 logs in on day 15
SETBIT logins:user:1001:2024 15 1
# How many days did user 1001 log in this year?
BITCOUNT logins:user:1001:2024
(integer) 2
# Did user 1001 log in on day 14?
GETBIT logins:user:1001:2024 14
(integer) 1
# First day user 1001 logged in
BITPOS logins:user:1001:2024 1
(integer) 14
Population-Wide Daily Active Users
Instead of per-user keys, use per-day keys where the offset is the user ID:
# All users who logged in on 2024-01-15
SETBIT active:2024-01-15 1001 1 ← user 1001
SETBIT active:2024-01-15 1002 1 ← user 1002
SETBIT active:2024-01-15 5000 1 ← user 5000
# Count DAU
BITCOUNT active:2024-01-15
(integer) 3
# Users active on both Jan 15 AND Jan 16 (intersection)
BITOP AND active:both:2024-01-15-16 active:2024-01-15 active:2024-01-16
BITCOUNT active:both:2024-01-15-16
# Users active on Jan 15 OR Jan 16 (union)
BITOP OR active:any:2024-01-15-16 active:2024-01-15 active:2024-01-16
BITCOUNT active:any:2024-01-15-16
Memory note: A bitmap for 1 million users requires 1,000,000 bits = 125KB. Even with 100 million users, a single DAU bitmap is 12.5MB — manageable for per-day tracking.
Caveat: Bit offsets are allocated based on the maximum offset ever set. If user ID 99,999,999 logs in, Redis allocates a ~12MB string immediately, even if only that one user exists. Numeric IDs must be dense (not UUIDs) for Bitmaps to be memory-efficient.
Feature Flags and A/B Testing
# Enable feature "dark-mode" for user 42
SETBIT feature:dark-mode 42 1
# Check if user 42 has the feature
GETBIT feature:dark-mode 42
At millions of users, this is significantly more memory-efficient than a Set of user IDs with the feature enabled.
When to Use Bitmaps
- Daily/weekly active user tracking (compact per-day bitfield)
- Login streaks and retention cohorts
- Feature flag rollouts (user IDs as offsets)
- Read receipts and notification tracking at scale
- Any boolean-per-user-per-day pattern
Do not use Bitmaps when:
- User IDs are not dense integers (UUIDs → wasteful; use a Set or Bloom filter)
- The per-item state has more than 2 values (use a Hash or sorted set per user)
- You need to iterate over the set of users with a flag enabled (use a Set; BITPOS finds the first but iterating all is awkward)
Geospatial
The Problem: Location-Based Queries
Storing and querying user locations — "find all drivers within 5km of this point" — requires either a spatial index or clever approximation. Redis's Geo commands give you a spatial index backed by a Sorted Set.
What Geospatial Is
Redis Geo is not a separate type — it is a set of commands that operate on a Sorted Set where the score is a 52-bit Geohash encoding of the longitude/latitude. This means every GEO* command is ultimately stored and retrieved via the Sorted Set's skip list, and you can use ZRANGE on a Geo key if needed.
The encoding achieves roughly ±0.0001° precision (about ±0.6 metres at the equator — sufficient for any real-world use case).
Commands
GEOADD key [NX|XX] [CH] lng lat member [lng lat member ...]
→ add one or more locations
GEODIST key member1 member2 [m|km|mi|ft]
→ distance between two members
GEOPOS key member [member ...]
→ return (longitude, latitude) for members
GEOSEARCH key FROMMEMBER member | FROMLONLAT lng lat
BYRADIUS radius m|km|mi|ft | BYBOX width height m|km|mi|ft
[ASC|DESC] [COUNT count [ANY]] [WITHCOORD] [WITHDIST] [WITHSCORE]
→ find members within radius or bounding box (Redis 6.2+)
GEOSEARCHSTORE destkey ...
→ like GEOSEARCH but stores results in a Sorted Set
The older GEORADIUS and GEORADIUSBYMEMBER commands are deprecated in Redis 6.2 in favour of GEOSEARCH. Use GEOSEARCH.
Building a Nearby Drivers Feature
# Add drivers to the index
GEOADD drivers:online 77.5946 12.9716 "driver:101" ← Bangalore
GEOADD drivers:online 77.5950 12.9720 "driver:102"
GEOADD drivers:online 80.2707 13.0827 "driver:103" ← Chennai (far away)
# Find drivers within 2km of a rider's location
GEOSEARCH drivers:online FROMLONLAT 77.5947 12.9715 BYRADIUS 2 km ASC WITHCOORD WITHDIST COUNT 10
1) 1) "driver:101"
2) "0.0127" ← distance in km
3) 1) "77.59460"
2) "12.97160"
2) 1) "driver:102"
2) "0.0523"
3) 1) "77.59500"
2) "12.97200"
# driver:103 is not returned (too far)
# Distance between two drivers
GEODIST drivers:online "driver:101" "driver:102" km
"0.0529"
# Get coordinates of a specific driver
GEOPOS drivers:online "driver:101"
1) 1) "77.59460"
2) "12.97160"
Updating a Driver's Location
Since Geo uses a Sorted Set internally, updating a member's position is the same as re-adding it:
# Driver moves — just GEOADD again with new coordinates
GEOADD drivers:online 77.5948 12.9718 "driver:101"
GEOADD with an existing member updates the score (position). No need to delete first.
Removing a Member
ZREM drivers:online "driver:101" ← removes the member from the underlying Sorted Set
Or use GEOADD ... XX to only update existing members (won't add if missing) or NX to only add new ones (won't update if existing).
Bounding Box Search
BYBOX is useful for map viewport queries:
# Find all restaurants within a 10km × 10km box centered at a point
GEOSEARCH restaurants FROMLONLAT 77.5946 12.9716 BYBOX 10 10 km ASC COUNT 50
Precision and Limitations
The Geohash encoding has a precision of ~0.6m. For most location-aware applications (ride-hailing, delivery, nearby businesses), this is more than sufficient. For sub-metre precision (surveying, GPS coordinates with cm accuracy), use a dedicated spatial database.
Geo keys are plain Sorted Sets. This means:
ZCARD drivers:onlinetells you how many drivers are in the indexZSCANiterates over all drivers- There is no built-in TTL per member — to remove a driver who goes offline, call
ZREM - For dynamic driver locations (updating every few seconds), consider whether the index size stays manageable
When to Use Geospatial
- Ride-hailing / delivery — nearby driver search
- Restaurant/store finder — "near me" queries
- Delivery radius validation — is this address within our delivery zone?
- Location-aware notifications — alert users near an event
- Any "find X within Y distance of Z" query
Do not use Redis Geo when:
- You need complex spatial queries (polygon containment, routing, map matching) — use PostGIS
- Precision below 1 metre is required
- You need to store and query trajectory data over time (use a time-series + PostGIS)
Comparing the Three Types
| Type | Problem Solved | Memory | Accuracy | Use Case |
|---|---|---|---|---|
| HyperLogLog | Count distinct elements | ≤ 12KB always | ~99.2% (0.81% error) | DAU, unique visitors, distinct queries |
| Bitmap | Boolean per-user tracking | 1 bit per user | Exact | Login streaks, DAU, feature flags |
| Geospatial | Location + radius queries | ~88 bytes per member | ±0.6m | Nearby search, delivery, ride-hailing |
The common thread: each type trades something (exactness, raw flexibility, or simplicity) for a massive reduction in memory or query complexity at scale. Knowing when to make that trade is the mark of a Redis practitioner.
Summary
- HyperLogLog —
PFADD/PFCOUNT/PFMERGE. Estimates cardinality with ±0.81% error in ≤ 12KB. Use for unique visitor counts, distinct query tracking. - Bitmaps —
SETBIT/GETBIT/BITCOUNT/BITOP. Boolean state per dense integer ID, 1 bit per user. Use for login streaks, DAU, feature flags. - Geospatial —
GEOADD/GEOSEARCH/GEODIST. Location index backed by a Sorted Set, ~0.6m precision. Use for nearby search, delivery radius, ride-hailing. - All three solve problems that would require dramatically more memory with naive Set or String approaches.
Next: F-6 — Pipelining and the RESP Protocol — how to batch commands to eliminate network round-trips, and why naive await redis.get() in a loop is one of the most common Redis performance mistakes.