Module P-4·20 min read

INFO memory field-by-field, MEMORY USAGE and MEMORY DOCTOR, scanning for oversized keys, encoding threshold tuning, active defragmentation configuration, and a production workflow for diagnosing unexpected memory growth.

P-4 — Memory Profiling and Optimization

Q: An engineering team notices their Redis instance using 4GB of RAM (`used_memory_rss`), but `used_memory` is only 1.2GB. The instance has been running for 6 months with high key churn (constant creation and deletion of keys). What is the primary cause of this discrepancy, and what is the safest way to reclaim the memory without downtime?

The jemalloc allocator is suffering from high fragmentation (`mem_fragmentation_ratio` > 3.0), meaning it cannot reuse freed memory pages efficiently. They should enable `activedefrag yes` and let the background process compact memory. — `used_memory` is what Redis logically requested. `used_memory_rss` is what the OS actually granted. A massive difference between these two indicates memory fragmentation. High key churn often leaves "holes" in jemalloc's memory pages. Because a page cannot be released back to the OS until it is completely empty, heavily fragmented instances hold onto gigabytes of unused RAM. Enabling active defragmentation allows Redis to slowly relocate allocations, freeing up whole pages and returning memory to the OS without requiring a restart.

Q: A developer wants to store 500,000 user profiles. Each profile contains 10 attributes (name, email, age, etc.). They write a script that stores each attribute as a separate top-level string key: `user:1001:name "Alice"`, `user:1001:age "30"`, etc. Why is this structurally inefficient in Redis, and what is the standard optimization?

Every top-level key incurs substantial memory overhead (~80 bytes for the dictionary entry, robust object header, and SDS wrapper). By grouping the 10 attributes into a single Redis Hash (`HSET user:1001 name "Alice" age "30"`), Redis can use the compact `listpack` encoding, drastically reducing the structural overhead and saving megabytes of RAM. — In Redis, the keyspace itself is a dictionary, and every key you add requires pointer overhead. For very small values (like "Alice" or "30"), the metadata overhead often exceeds the actual payload. The "Hash-for-small-objects" pattern leverages the fact that small Redis Hashes are internally encoded as linear `listpack` arrays rather than full hash tables, entirely eliminating the per-field dictionary overhead and yielding memory savings often ranging from 5x to 10x.

Q: An application stores product catalogs in Redis Hashes. Each Hash has about 200 fields. The engineering team checks `OBJECT ENCODING catalog:books` and sees it is using `hashtable`. They want to reduce memory usage by converting these Hashes to the more efficient `listpack` encoding. What must they do?

Increase the `hash-max-listpack-entries` configuration value to 256. However, because Redis does not proactively downgrade encodings, they must either recreate the existing keys (e.g., dump and restore) or wait for natural key churn to replace them. — Data structure encoding in Redis is designed to upgrade automatically (from listpack to hashtable when a threshold is breached) but it rarely downgrades proactively because re-encoding large structures is CPU intensive. Changing the `hash-max-listpack-entries` threshold in `redis.conf` only affects new writes. To force existing `hashtable` encodings back to `listpack`, the data must be rewritten or reloaded into Redis.

Who this module is for: Redis is using more RAM than expected and you do not know why. Or you are designing a Redis schema and want to estimate memory costs before deploying. This module covers the full suite of Redis memory inspection tools and the optimizations that consistently recover the most RAM in production.

The Memory Audit Starting Point: INFO memory

Every Redis memory investigation starts here:

INFO memory

text

used_memory:              1528000000   → allocations from Redis's perspective (bytes)
used_memory_human:        1.42G
used_memory_rss:          2097152000   → RSS reported by OS (includes fragmentation)
used_memory_rss_human:    1.95G
used_memory_peak:         1600000000   → peak allocation since server start
used_memory_peak_human:   1.49G
used_memory_peak_perc:    95.50%       → current / peak
used_memory_overhead:     852000       → internal overhead (dicts, expiry table, etc.)
used_memory_startup:      864000       → baseline memory at startup
used_memory_dataset:      1527136000   → data memory (used_memory - overhead)
used_memory_dataset_perc: 99.94%       → dataset / (peak - startup)
allocator_allocated:      1528100000   → bytes allocated from jemalloc
allocator_active:         1953300000   → bytes in active jemalloc pages
allocator_resident:       2097100000   → bytes in resident jemalloc pages
total_system_memory:      8589934592   → total RAM on the machine
maxmemory:                2147483648   → configured maxmemory (2GB)
maxmemory_human:          2.00G
maxmemory_policy:         allkeys-lru
mem_fragmentation_ratio:  1.37         → RSS / used_memory
mem_fragmentation_bytes:  569152000    → bytes "lost" to fragmentation
mem_not_counted_for_evict: 0
mem_replication_backlog:  1048576      → replication backlog size
mem_clients_slaves:       20512
mem_clients_normal:       84000
mem_cluster_links:        0
mem_aof_buffer:           8
active_defrag_running:    0
lazyfree_pending_objects: 0
lazyfreed_objects:        42831

Interpreting the Key Fields

mem_fragmentation_ratio = used_memory_rss / used_memory

< 1.0 → Redis is using swap (critical, investigate immediately)
1.0–1.2 → healthy
1.2–1.5 → moderate fragmentation (normal for dynamic workloads)
1.5 → high fragmentation — consider activedefrag or restart

used_memory_overhead = memory used by Redis's internal data structures (the global keyspace dict, expiry table, per-client buffers). If this is a large fraction of used_memory, you have very small values (overhead dominates) — consider consolidating keys into Hashes.

mem_clients_normal = memory used by client output buffers. If this is large (> 10MB), you may have slow clients receiving data faster than they can consume it.

MEMORY USAGE: Per-Key Cost

MEMORY USAGE key [SAMPLES count]

Returns the exact number of bytes allocated for a key and its value, including all internal structures (robj, SDS, listpack nodes, etc.).

text

127.0.0.1:6379> MEMORY USAGE user:1001
(integer) 128

127.0.0.1:6379> MEMORY USAGE large:hash
(integer) 4194304   ← this hash is using 4MB

For collections, SAMPLES controls how many elements are sampled to estimate total cost (default 5). Use SAMPLES 0 for exact measurement on small collections.

Using MEMORY USAGE to find expensive keys:

bash

#!/bin/bash
# Find the 20 most memory-hungry keys
redis-cli --scan | while read key; do
  size=$(redis-cli MEMORY USAGE "$key" 2>/dev/null)
  echo "$size $key"
done | sort -n -r | head -20

On a production instance with millions of keys, sample a representative subset:

bash

redis-cli --scan --count 1000 | shuf | head -1000 | while read key; do
  size=$(redis-cli MEMORY USAGE "$key" 2>/dev/null)
  echo "$size $key"
done | sort -n -r | head -20

MEMORY DOCTOR

MEMORY DOCTOR

Returns a human-readable diagnosis. Possible outputs:

text

"Sam, I detected a few problems: 
 * High total allocator fragmentation: The RSS reported by the allocator is 
   suspicious. This could be caused by ...
 * High rss overhead: ..."

Or for a healthy instance:

"Sam, I have detected no problems in the server memory subsystem."

Not a substitute for INFO memory, but a quick sanity check.

MEMORY MALLOC-STATS

MEMORY MALLOC-STATS

Dumps the full jemalloc allocator statistics — bin sizes, fragmentation per bin, active vs retained pages. Useful when you suspect allocator-level fragmentation rather than Redis-level issues.

Finding the Memory Culprits

Pattern 1: Large Hashes in hashtable Encoding

A Hash with > 128 fields (or any field > 64 bytes) switches from listpack to hashtable encoding. The memory cost jumps roughly 5x per element. Find them:

bash

redis-cli --scan --pattern "user:*" | while read key; do
  type=$(redis-cli TYPE "$key")
  if [ "$type" = "hash" ]; then
    len=$(redis-cli HLEN "$key")
    encoding=$(redis-cli OBJECT ENCODING "$key")
    if [ "$encoding" = "hashtable" ]; then
      size=$(redis-cli MEMORY USAGE "$key")
      echo "$size $key $len fields"
    fi
  fi
done | sort -n -r | head -20

If you find 1,000 user Hashes in hashtable encoding that should be in listpack encoding (they have < 128 fields), your encoding threshold is wrong. Check:

text

CONFIG GET hash-max-listpack-entries
CONFIG GET hash-max-listpack-value

If the threshold is already 128 but hashes have 50 fields in hashtable encoding, some field values exceed 64 bytes. Identify them with HGETALL on a sample key.

Pattern 2: Keys Without TTL (Orphaned Data)

bash

# Count keys with and without TTL
redis-cli --scan | while read key; do
  ttl=$(redis-cli TTL "$key")
  if [ "$ttl" -eq -1 ]; then
    echo "no-ttl $key"
  fi
done | wc -l

Or in a Redis script:

text

INFO keyspace
→ db0:keys=500000,expires=50000,avg_ttl=3600000

If expires is much less than keys, most of your keys have no TTL. For a cache, this means eviction will eventually clear them — but you are paying for that memory until then. For an application database, this is expected.

Pattern 3: String Keys Storing JSON When Hashes Would Be Better

OBJECT ENCODING tells you if a String key is in raw encoding (large string). If it is storing a JSON blob, consider whether you update individual fields — if so, a Hash is more efficient and enables atomic partial updates.

bash

redis-cli --scan --pattern "user:*" | while read key; do
  encoding=$(redis-cli OBJECT ENCODING "$key")
  if [ "$encoding" = "raw" ]; then
    size=$(redis-cli MEMORY USAGE "$key")
    echo "$size $key raw-string"
  fi
done | sort -n -r | head -20

Pattern 4: Sorted Set Keys in skiplist Encoding

Sorted Sets with > 128 members use skiplist encoding. A skiplist + hashtable for 1,000 members uses ~250KB; listpack for the same 1,000 members uses ~55KB. If you have many small-to-medium sorted sets exceeding the listpack threshold by a few members:

CONFIG SET zset-max-listpack-entries 256   → raise threshold if members are ≤ 64 bytes

Active Defragmentation

When mem_fragmentation_ratio > 1.5, enable active defragmentation:

text

CONFIG SET activedefrag yes
CONFIG SET active-defrag-ignore-bytes 100mb    → don't defrag if fragmentation bytes < 100MB
CONFIG SET active-defrag-threshold-lower 10    → start at 10% fragmentation
CONFIG SET active-defrag-threshold-upper 100   → max effort at 100%
CONFIG SET active-defrag-cycle-min 1           → min CPU % for defrag
CONFIG SET active-defrag-cycle-max 25          → max CPU % for defrag

Active defragmentation runs a background scan, finding allocations that can be moved to compacted jemalloc pages. It uses 1–25% of a CPU core and can recover significant memory without restarting Redis.

When it is not enough: If mem_fragmentation_ratio > 2.0 and the instance has been running for months with heavy churn, active defragmentation may be slow to converge. A Redis restart (graceful shutdown → dump.rdb / AOF flush → restart → reload) resets memory layout and eliminates fragmentation instantly. Plan this during a low-traffic window.

Encoding Threshold Tuning

The single most impactful memory optimization is ensuring data structures use compact encodings.

Hash Thresholds

text

hash-max-listpack-entries 128   → listpack if ≤ 128 fields
hash-max-listpack-value 64      → listpack if all values ≤ 64 bytes

If your user Hashes have 50 fields with values averaging 30 bytes: raise to entries 256, value 64 to keep them in listpack. Memory reduction: ~5x per hash.

Sorted Set Thresholds

text

zset-max-listpack-entries 128
zset-max-listpack-value 64

If your leaderboard sorted sets have up to 200 members under 40 bytes each: raise to entries 256.

Set Thresholds (integer sets)

set-max-intset-entries 512    → intset (sorted integer array) if all members are integers

If you are storing user IDs (integers) in Sets: intset is the most compact encoding. Ensure all members are integers to keep the intset encoding.

Testing Threshold Changes

After changing thresholds:

New keys will use the new thresholds
Existing keys will NOT automatically convert (they were already promoted to the larger encoding)
To convert existing keys: DUMP + RESTORE or use redis-cli --pipe to reload the data

The safest approach for large deployments: change thresholds and let natural key churn (TTL expiry + re-creation) gradually adopt the new encoding.

The Hash-for-Small-Objects Pattern

For reference, here is the memory comparison that drives this pattern:

text

# 100 users stored as individual String keys with JSON
user:1     → raw string (80 bytes JSON) → ~160 bytes in Redis
user:2     → raw string (80 bytes JSON) → ~160 bytes in Redis
...
user:100   → raw string (80 bytes JSON) → ~160 bytes in Redis
Total: ~16,000 bytes (16KB)

# Same 100 users stored as fields in a single Hash
users      → listpack Hash → ~20 bytes overhead + 11 bytes/field-pair
           → 100 * (name + value + overhead) → ~2,200 bytes
Total: ~2.2KB — 7x more efficient

The per-key overhead (robj, SDS for key, hash table entry) adds ~80 bytes per key. When values are small, this overhead dominates. Grouping small objects under a single Hash key eliminates most of it.

Implementation: Instead of SET user:1001:name "Jatin", use HSET users 1001:name "Jatin" (or one Hash per user: HSET user:1001 name "Jatin" email "...".

This pattern has limits: a Hash cannot have per-field TTLs, and you cannot atomically query across multiple user Hashes. For most use cases, the memory savings outweigh these constraints.

Memory Optimization Checklist

Run INFO memory — check mem_fragmentation_ratio, used_memory_overhead
Find top 20 keys by memory: MEMORY USAGE + SCAN
Check encoding of large collections: OBJECT ENCODING
Identify Hashes in hashtable encoding that should be listpack: HLEN + OBJECT ENCODING
Check for keys without TTL in a cache context: INFO keyspace (expires vs keys count)
Review encoding thresholds: CONFIG GET hash-max-listpack-*, zset-max-listpack-*
Consider active defragmentation if mem_fragmentation_ratio > 1.5
Evaluate Hash-for-small-objects pattern for high-cardinality small-value datasets
Set maxmemory and maxmemory-policy if not set (do not let Redis use unbounded RAM)

Summary

INFO memory is the starting point: check mem_fragmentation_ratio, used_memory_dataset, mem_clients_normal
MEMORY USAGE key gives exact per-key RAM cost including all internal structures
MEMORY DOCTOR for a quick health check; MEMORY MALLOC-STATS for allocator-level detail
Encoding inspection: OBJECT ENCODING reveals whether a key is in compact (listpack, intset) or large (hashtable, skiplist) encoding
Tuning encoding thresholds (hash-max-listpack-entries, zset-max-listpack-entries) is the highest-leverage memory optimization
Active defragmentation (activedefrag yes) recovers memory from fragmentation without restarting
The Hash-for-small-objects pattern reduces per-key overhead for high-cardinality small datasets by 5–10x

Next: P-5 — Atomic Counters, Rate Limiters, and Sliding Windows — building lock-free counters, fixed and sliding window rate limiters, and the token bucket algorithm using Redis's atomic operations.

Knowledge Check

An engineering team notices their Redis instance using 4GB of RAM (used_memory_rss), but used_memory is only 1.2GB. The instance has been running for 6 months with high key churn (constant creation and deletion of keys). What is the primary cause of this discrepancy, and what is the safest way to reclaim the memory without downtime?

A developer wants to store 500,000 user profiles. Each profile contains 10 attributes (name, email, age, etc.). They write a script that stores each attribute as a separate top-level string key: user:1001:name "Alice", user:1001:age "30", etc. Why is this structurally inefficient in Redis, and what is the standard optimization?

An application stores product catalogs in Redis Hashes. Each Hash has about 200 fields. The engineering team checks OBJECT ENCODING catalog:books and sees it is using hashtable. They want to reduce memory usage by converting these Hashes to the more efficient listpack encoding. What must they do?

Test your knowledge with more question sets

PreviousModule P-3: Persistence Decision Framework: RDB vs AOF vs Both vs None Next Module P-5: Atomic Counters, Rate Limiters, and Sliding Windows

Discussion

Join the discussion

Loading comments...