Module P-7·20 min read

Three distinct failure modes requiring different solutions — XFetch probabilistic expiry for stampede, TTL jitter for avalanche, Bloom filter pre-gating for penetration. Detection patterns and production mitigations.

P-7 — Cache Stampede, Avalanche, and Penetration

Who this module is for: Your cache is configured and working. Then, under load or at a specific moment, your database CPU spikes to 100%, response times collapse, and the system partially recovers when the cache warms up again. This is a cache failure — and there are three distinct failure modes, each requiring a different fix. Treating all three the same is why most mitigation attempts fail.


The Three Failure Modes

FailureTriggerSymptomSolution
Cache stampedeOne popular key expiresSudden DB spike on one queryProbabilistic expiry, mutex lock
Cache avalancheMany keys expire simultaneouslySustained DB overloadTTL jitter, pre-warming
Cache penetrationRequests for non-existent keysSustained DB queries with empty resultsNull caching, Bloom filter

Each has a different root cause, different detection signature, and different fix.


Cache Stampede (Thundering Herd)

What It Is

A single popular cache key expires. In the milliseconds before the first request can recompute and repopulate it, dozens or hundreds of concurrent requests see a miss and all race to recompute the same expensive query. The database receives N identical queries simultaneously.

Detection

# In your metrics:
database_query_count → spike to N×normal for ~duration_of_recompute
cache_miss_rate → spike on a specific key

The spike is narrow and short-lived — it resolves when the first request finishes recomputing and populates the cache. The next expiry cycle causes another spike.

Fix 1: Probabilistic Early Expiry (XFetch Algorithm)

Instead of waiting for the key to expire, some requests recompute before expiry with a probability proportional to how close the key is to expiry. This "warms" the key proactively, preventing the expiry from ever causing a stampede.

typescript
async function getWithXFetch<T>( key: string, ttl: number, // desired TTL in seconds compute: () => Promise<T>, beta: number = 1.0 // higher = more aggressive early recompute ): Promise<T> { const data = await redis.get(key); if (data) { const { value, delta, expiry } = JSON.parse(data); const remainingTTL = expiry - Date.now() / 1000; // Recompute with probability proportional to remaining time and compute cost // XFetch: recompute if -delta * beta * log(random()) > remainingTTL const shouldRecompute = -delta * beta * Math.log(Math.random()) > remainingTTL; if (!shouldRecompute) { return value as T; } // Fall through to recompute } const startTime = Date.now(); const value = await compute(); const delta = (Date.now() - startTime) / 1000; // recompute time in seconds const payload = JSON.stringify({ value, delta, expiry: Date.now() / 1000 + ttl, }); await redis.set(key, payload, 'EX', ttl); return value; }

The delta (recompute time) is stored alongside the value. Expensive queries get earlier preemptive recompute because they have a larger delta. The beta parameter controls aggressiveness — beta = 1 is the standard algorithm.

Fix 2: Mutex Lock (Single Flight)

Only one request recomputes at a time. Others wait for the lock to be released, then serve from the (now-populated) cache.

typescript
async function getWithLock<T>( key: string, ttl: number, compute: () => Promise<T> ): Promise<T> { const lockKey = `lock:${key}`; // Try cache first const cached = await redis.get(key); if (cached) return JSON.parse(cached); // Try to acquire lock (NX = only if not exists, EX = expire in 10s) const locked = await redis.set(lockKey, '1', 'NX', 'EX', 10); if (locked === 'OK') { try { // Double-check cache (another process may have populated it) const recheck = await redis.get(key); if (recheck) return JSON.parse(recheck); const value = await compute(); await redis.set(key, JSON.stringify(value), 'EX', ttl); return value; } finally { await redis.del(lockKey); } } else { // Wait for the lock holder to finish, then serve from cache await new Promise(resolve => setTimeout(resolve, 50 + Math.random() * 50)); return getWithLock(key, ttl, compute); // recursive retry } }

Trade-off: Lock holders that crash leave the lock in place until EX expires. Set the lock TTL to exceed the maximum expected compute time.


Cache Avalanche

What It Is

Many cache keys expire at roughly the same time. If all your cached data was loaded at startup (cold start after deployment) with the same TTL, all keys expire together. The database receives a flood of queries across many different data types — not a spike on one query, but a sustained overload across all queries.

Detection

cache_miss_rate → sustained elevation across many different keys/endpoints
database_query_count → broad sustained spike (not narrow like stampede)
application_response_time → slow across all endpoints, not one

The signature: broad, sustained, across all endpoints simultaneously. Typically occurs ~TTL seconds after a cold start or major deployment.

Fix 1: TTL Jitter

Instead of setting all keys to the same TTL, add random jitter to spread expiry times:

typescript
function jitteredTTL(baseTTL: number, jitterFraction: number = 0.1): number { const jitter = baseTTL * jitterFraction; return Math.floor(baseTTL + (Math.random() * 2 - 1) * jitter); } // 5-minute TTL with ±30 seconds jitter await redis.set(key, value, 'EX', jitteredTTL(300, 0.1)); // Sets TTL to anywhere from 270 to 330 seconds

With 10% jitter, keys set at the same time expire across a 60-second spread instead of all at once. The database load is distributed over time.

Fix 2: Staggered Pre-Warming

Before traffic arrives (post-deployment, post-restart), warm the cache in batches with delays between batches:

typescript
async function warmCache(userIds: string[]) { const batchSize = 100; for (let i = 0; i < userIds.length; i += batchSize) { const batch = userIds.slice(i, i + batchSize); await Promise.all(batch.map(async (userId) => { const data = await db.query('SELECT * FROM users WHERE id = $1', [userId]); await redis.set(`user:${userId}`, JSON.stringify(data), 'EX', jitteredTTL(3600)); })); // Delay between batches to avoid overwhelming the database if (i + batchSize < userIds.length) { await new Promise(resolve => setTimeout(resolve, 100)); } } }

Warm the most-accessed data first (top users, popular products), then less-popular data progressively.

Fix 3: Never-Expiry + Background Refresh

For truly critical keys (homepage, global config), use no TTL and refresh in a background job:

typescript
// Background job (runs every 5 minutes) async function refreshCriticalCache() { const data = await db.query('SELECT * FROM config WHERE active = true'); await redis.set('config:global', JSON.stringify(data)); // No EX — key never expires; refreshed proactively }

The key never expires, so cache misses never happen. Data is at most one refresh interval stale.


Cache Penetration

What It Is

Requests arrive for keys that do not exist — and will never exist. For example: requests for user IDs that are not in your database (invalid IDs, enumeration attacks, deleted users). Each request misses the cache and hits the database with a query that returns no rows.

Unlike stampede and avalanche, penetration is sustained: the non-existent keys never get cached (there is nothing to cache), so every request hits the database.

Detection

cache_miss_rate → elevated, but database queries return empty results
database_queries_with_empty_results → elevated
attack_pattern → often a sequence of incrementing or random IDs

The signature: high miss rate, but database is returning empty results (not data). Sustained, not time-bounded.

Fix 1: Null Caching

Cache the "not found" result with a short TTL:

typescript
async function getUser(userId: string) { const key = `user:${userId}`; const cached = await redis.get(key); if (cached !== null) { if (cached === 'NULL') return null; // cached null result return JSON.parse(cached); } const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]); if (user === null) { // Cache the null result for 60 seconds (limit repeated DB hits) await redis.set(key, 'NULL', 'EX', 60); return null; } await redis.set(key, JSON.stringify(user), 'EX', 3600); return user; }

Limitation: An attacker can enumerate many different non-existent IDs, caching 'NULL' for each. This fills Redis with null entries. Mitigate with a short TTL (60 seconds) and rate limiting on the endpoint.

Fix 2: Bloom Filter

A Bloom filter answers "might this key exist?" with a configurable false-positive rate and zero false negatives. Check the Bloom filter before hitting the cache or database:

Request → Bloom filter: "does user:99999 exist?"
         → No → return 404 immediately (never hits cache or DB)
         → Maybe → try cache, then DB

Redis Stack (formerly Redis Modules) includes a Bloom filter implementation:

BF.ADD users 1001    → add user 1001 to the Bloom filter
BF.ADD users 1002
BF.EXISTS users 1001 → 1 (definitely or probably exists)
BF.EXISTS users 9999 → 0 (definitely does not exist)
typescript
async function getUserWithBloom(userId: string) { // Check Bloom filter first (very fast, no DB hit on definite miss) const mightExist = await redis.call('BF.EXISTS', 'users:bloom', userId); if (!mightExist) return null; // definitely not in DB // Cache-aside logic const key = `user:${userId}`; const cached = await redis.get(key); if (cached) return JSON.parse(cached); const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]); if (!user) return null; // false positive from Bloom filter — rare await redis.set(key, JSON.stringify(user), 'EX', 3600); return user; }

Bloom filter management: Pre-populate on startup from the database. Add to the filter whenever a new user is created. The filter never shrinks (Bloom filters are not deletable) — rebuild periodically if many users are deleted.

# Create Bloom filter with 1% false positive rate, expected 10M items
BF.RESERVE users:bloom 0.01 10000000

Without Redis Stack, implement a Bloom filter using a Redis Bitmap (manually hash the key N times, set the corresponding bits, check all bits to test membership).


Summary

Cache Stampede:

  • Cause: one popular key expires while under high traffic
  • Detection: narrow spike on one query, short duration
  • Fix: XFetch probabilistic early expiry, or mutex lock with single-flight pattern

Cache Avalanche:

  • Cause: many keys expire simultaneously (same TTL set at same time)
  • Detection: broad sustained spike across all endpoints, typically post-deployment
  • Fix: TTL jitter (spread expiry times), staggered cache warming, never-expiry for critical keys

Cache Penetration:

  • Cause: requests for non-existent data (invalid IDs, deleted records, attacks)
  • Detection: high miss rate with empty DB results, sustained, not time-bounded
  • Fix: null caching with short TTL, Bloom filter pre-gating

Apply all three fixes proactively. Stampede and avalanche are near-certainties for any cache under serious load.

Next: P-8 — Keyspace Notifications and Event-Driven Architectures — using Redis's internal events (key expiry, deletion, write commands) as triggers for application logic.

© 2026 Jatin Jain Saraf (JJS). All rights reserved.