The robj structure, SDS strings, jemalloc size classes, memory fragmentation ratio, active defragmentation, MEMORY USAGE per key, and practical strategies for cutting Redis memory usage by 50%.
F-9 — Memory Layout and Object Encoding Internals
Who this module is for: You know Redis is fast because it uses RAM, but you have never understood how it uses RAM — what a key actually costs, why two seemingly similar datasets can have wildly different memory footprints, or why your 100MB dataset grows to 300MB in Redis. This module covers the memory model, jemalloc, memory fragmentation, the object encoding hierarchy, and practical tools for diagnosing and reducing Redis memory usage.
How Redis Stores Data in Memory
When you run SET user:1001 "Jatin", Redis does not simply store the bytes "Jatin" in memory. It creates a chain of C structures:
- The hash table entry — a pointer in the global keyspace hash table pointing to the key object
- The key object (robj) — a Redis Object structure containing type, encoding, reference count, LRU clock, and a pointer to the actual data
- The key string (SDS) — a Simple Dynamic String: 4 bytes of header (length, free space) + the key bytes + null terminator
- The value object (robj) — same structure as the key object
- The value data — the actual value bytes (or a pointer to a more complex structure like a listpack or skiplist)
For a key user:1001 (9 bytes) with value "Jatin" (5 bytes), the actual memory consumption is roughly:
| Structure | Approximate Size |
|---|---|
| Hash table entry (two pointers) | 16 bytes |
| Key robj | 16 bytes |
| Key SDS header | 4 bytes |
| Key bytes ("user:1001\0") | 10 bytes |
| Value robj | 16 bytes |
| Value SDS / int encoding | 8–16 bytes |
| jemalloc overhead (rounding to bin size) | 0–24 bytes |
Total: roughly 80–100 bytes for a 14-byte key-value pair. The ratio of overhead to payload is high for small values. This is why key naming matters — short keys reduce overhead, and large values (like JSON blobs) amortize the per-key cost better.
Simple Dynamic Strings (SDS)
Redis stores all strings (keys and String-type values) as SDS — its own string implementation, not C's null-terminated strings.
SDS header (as of Redis 3.2+, the header size depends on string length):
c// For strings ≤ 32 bytes: struct sdshdr5 { ... } // 1 byte header // For strings ≤ 255 bytes: struct sdshdr8 { uint8_t len; // current length uint8_t alloc; // allocated capacity (excluding header + null terminator) unsigned char flags; // header type char buf[]; // the actual bytes }; // For strings > 255 bytes: sdshdr16, sdshdr32, sdshdr64
SDS allows O(1) length lookup (no scanning for null terminator), binary safety (can store null bytes in the middle), and pre-allocated capacity for append operations (reducing reallocation frequency).
The key insight: every SDS has at least 1 byte of overhead beyond the data itself, and jemalloc rounds allocations to power-of-2 bin sizes. A 9-byte key ("user:101") gets a 16-byte allocation (next power of 2 after 9 + 1 header byte = 10 bytes).
The Redis Object (robj)
Every Redis value is wrapped in an robj (Redis Object):
ctypedef struct redisObject { unsigned type:4; // STRING, LIST, HASH, SET, ZSET, etc. unsigned encoding:4; // the internal encoding (INT, EMBSTR, RAW, LISTPACK, etc.) unsigned lru:24; // LRU clock for eviction int refcount; // reference count void *ptr; // pointer to the data, or the data itself for small ints } robj;
This structure is 16 bytes on a 64-bit system.
Shared integers: Redis pre-allocates objects for integers 0–9999. SET counter 42 stores the integer 42 as a pointer to a pre-allocated shared object — no allocation needed. This is why OBJECT REFCOUNT counter returns a number > 1 for small integers (multiple keys may point to the same shared object).
jemalloc and Memory Fragmentation
Redis uses jemalloc as its memory allocator. jemalloc manages memory in size classes (bins): 8, 16, 32, 64, 128, 192, 256 bytes, etc. When Redis requests 10 bytes, jemalloc gives it a 16-byte slot. The 6 unused bytes are "internal fragmentation."
Memory fragmentation ratio:
INFO memory
→ mem_fragmentation_ratio: 1.52
mem_fragmentation_ratio = used_memory_rss / used_memory
used_memory— what Redis believes it is using (its allocations)used_memory_rss— what the OS reports Redis is using (RSS = Resident Set Size)
| Ratio | Interpretation |
|---|---|
| ~1.0 | Healthy — very little fragmentation |
| 1.1–1.5 | Acceptable — some fragmentation, normal for dynamic workloads |
| > 1.5 | High fragmentation — wasted memory, consider defragmentation |
| < 1.0 | Redis has swapped some data to disk (very bad — indicates memory pressure) |
Causes of Fragmentation
- Key churn — many keys created and deleted over time leaves holes in the allocator's free lists
- Resizing data structures — Hash or List encoding upgrades (listpack → hashtable) free the old structure and allocate a new one; the old slot may not be immediately reusable
- Long-running instances — fragmentation accumulates over time
Active Defragmentation
Redis 4.0+ includes an active defragmentation feature (activedefrag):
CONFIG SET activedefrag yes
CONFIG SET active-defrag-ignore-bytes 100mb → don't defrag if fragmentation is < 100MB
CONFIG SET active-defrag-threshold-lower 10 → start defrag when fragmentation > 10%
CONFIG SET active-defrag-threshold-upper 100 → max defrag effort at 100% fragmentation
Active defragmentation runs in the background, moving live objects to compacted allocations. It adds some CPU overhead but can recover significant amounts of memory. Enable it if your mem_fragmentation_ratio consistently exceeds 1.5.
Object Encoding and Memory Cost
The OBJECT ENCODING command tells you which internal representation a key is using. Different encodings have dramatically different memory footprints.
String Encodings
127.0.0.1:6379> SET counter 42
127.0.0.1:6379> OBJECT ENCODING counter
"int" ← stored as a 64-bit integer in the robj's ptr field — no heap allocation
127.0.0.1:6379> SET greeting "Hello"
127.0.0.1:6379> OBJECT ENCODING greeting
"embstr" ← string ≤ 44 bytes: robj + SDS in a single allocation (one malloc call)
127.0.0.1:6379> SET essay "... (45+ bytes) ..."
127.0.0.1:6379> OBJECT ENCODING essay
"raw" ← string > 44 bytes: robj + SDS in two separate allocations
embstr is the most memory-efficient string encoding: the robj and the SDS are allocated together in a single 64-byte allocation. This avoids a separate heap allocation and improves cache locality. Strings ≤ 44 bytes use embstr; longer strings use raw.
Hash Encodings
127.0.0.1:6379> HSET user:1001 name "Jatin"
127.0.0.1:6379> OBJECT ENCODING user:1001
"listpack" ← ≤ 128 fields, each ≤ 64 bytes
# After crossing threshold:
127.0.0.1:6379> OBJECT ENCODING user:1001
"hashtable" ← pointer-based hash table with linked list buckets
Memory cost:
listpack: ~11 bytes per field-value pair + small header overhead (~7 bytes)hashtable: ~64 bytes per entry (two pointers for the hash table + two pointers for the linked list + the dictEntry struct + the key and value SDS strings)
A 10-field Hash in listpack encoding uses ~120 bytes. In hashtable encoding, the same 10 fields use ~700+ bytes. This is why keeping Hashes small (below the listpack threshold) matters enormously for memory efficiency.
List Encodings
"listpack" → compact byte array (small lists)
"quicklist" → doubly-linked list of listpack nodes (large lists)
A quicklist node is a listpack that stores up to list-max-listpack-size entries. The number of nodes is roughly total_elements / node_size. Each quicklist node has a header (~32 bytes) plus the listpack data.
Sorted Set Encodings
"listpack" → compact, linear scan O(N) but zero pointer overhead for small sets
"skiplist" → O(log N) with a separate hashtable; ~200 bytes per element overhead
A Sorted Set with 1,000 members in listpack encoding: ~55KB.
The same 1,000 members in skiplist encoding: ~250KB.
The encoding upgrade at the 129th member costs ~5x more memory.
Practical Memory Tools
DEBUG OBJECT
DEBUG OBJECT key
Returns: value at address (hex), serialized length in bytes (what RDB persistence would store), encoding, number of elements (for collections), and LRU time.
127.0.0.1:6379> DEBUG OBJECT user:1001
Value at:0x7f8b1c0040b0 refcount:1 encoding:listpack serializedlength:52 lru:8821634 lru_seconds_idle:3 type:hash
serializedlength is the compressed size — useful for estimating RDB file size.
MEMORY USAGE
MEMORY USAGE key [SAMPLES count]
Returns the exact number of bytes Redis uses for the key and its value, including the robj, SDS, and all internal structures. More accurate than estimating from encoding.
127.0.0.1:6379> MEMORY USAGE user:1001
(integer) 104
For collections (Hashes, Lists, Sets, Sorted Sets), SAMPLES controls how many elements are sampled to estimate total memory (since scanning all elements would be slow for large collections). Default is 5; use 0 for an exact calculation on small keys.
MEMORY DOCTOR
MEMORY DOCTOR
Returns a human-readable diagnosis of Redis's memory usage — whether there is high fragmentation, unusual RSS, or configuration issues.
INFO memory
Key fields for memory monitoring:
INFO memory
used_memory: 1520000000 → bytes Redis has allocated (what malloc reports)
used_memory_human: 1.42G
used_memory_rss: 2080000000 → bytes reported by OS (includes fragmentation)
used_memory_rss_human: 1.94G
mem_fragmentation_ratio: 1.37
used_memory_peak: 1600000000 → peak usage since server start
used_memory_peak_human: 1.49G
maxmemory: 2147483648
maxmemory_human: 2.00G
mem_allocator: jemalloc-5.3.0
Memory Optimization Strategies
1. Use the Right Encoding Thresholds
The default encoding thresholds are conservative. If your Hashes consistently have < 50 fields with values < 128 bytes, raising the listpack threshold keeps more data in the compact encoding:
hash-max-listpack-entries 256 # was 128
hash-max-listpack-value 128 # was 64
Do the same for Sets and Sorted Sets. Measure memory before and after with MEMORY USAGE.
2. Short Key Names
user:1001:profile (18 bytes) costs more RAM than u:1001:p (8 bytes). At 10 million keys, 10 bytes saved per key = 100MB. Balance readability vs memory. At small scale (< 1M keys), prioritize readability.
3. Use Hashes for Small Objects Instead of Top-Level Keys
Instead of:
SET user:1001:name "Jatin"
SET user:1001:email "j@example.com"
SET user:1001:role "engineer"
Use:
HSET user:1001 name "Jatin" email "j@example.com" role "engineer"
Three top-level keys each pay the full robj + SDS overhead. One Hash with three fields in listpack encoding uses significantly less memory because the listpack is a contiguous byte array with minimal per-field overhead.
This is the "use Hashes to store small objects" pattern — one of the most impactful Redis memory optimizations.
4. Set TTLs on Everything That Should Not Be Permanent
Orphaned keys without TTLs accumulate indefinitely. An audit with SCAN + TTL on a production instance often reveals thousands of keys that should have expired long ago. Set sensible TTLs on all cache and session keys.
5. Monitor Encoding Upgrades
When you add the 129th field to a Hash, it converts from listpack to hashtable and memory usage spikes 5x for that key. If you are storing per-user data in Hashes and some users have hundreds of fields, those keys are using hashtable encoding and consuming much more memory than typical users. Identify outliers with:
bashredis-cli --scan --pattern "user:*" | while read key; do encoding=$(redis-cli OBJECT ENCODING "$key") size=$(redis-cli MEMORY USAGE "$key") echo "$key $encoding $size" done | sort -k3 -rn | head -20
Summary
- Every Redis key costs ~80–100 bytes of overhead regardless of value size — key naming matters for small values
- Redis stores strings as SDS (Simple Dynamic Strings): length-prefixed, binary-safe, with jemalloc-managed allocation
- Every value is wrapped in an
robj(16 bytes): type, encoding, LRU clock, refcount, data pointer - jemalloc rounds allocations to bin sizes — small values pay internal fragmentation overhead
mem_fragmentation_ratio > 1.5→ enableactivedefrag; ratio < 1.0 → data is swapping (critical)MEMORY USAGE keygives exact per-key memory costOBJECT ENCODING keyreveals the internal representation; compact encodings (listpack, intset, embstr, int) use dramatically less memory than their large-structure counterparts- Use Hashes to group related small values — listpack Hash encoding is far more memory-efficient than multiple top-level String keys
- Set TTLs on all cache and session keys to prevent orphan key accumulation
Next: F-10 — Transactions: MULTI, EXEC, and Optimistic Locking with WATCH — how to execute a sequence of commands atomically and how to implement optimistic concurrency control for read-modify-write patterns.