Module F-10·22 min read

The robj structure, SDS strings, jemalloc size classes, memory fragmentation ratio, active defragmentation, MEMORY USAGE per key, and practical strategies for cutting Redis memory usage by 50%.

F-9 — Memory Layout and Object Encoding Internals

Q: A team is storing millions of user sessions. Initially, they use three separate keys per user: `session:1001:id`, `session:1001:token`, and `session:1001:expires`. A consultant advises them to switch to a single Redis Hash per user: `HSET session:1001 id ... token ... expires ...`. Why does this change drastically reduce memory consumption?

Separate top-level keys each require their own `robj` wrapper, Hash Table entry pointers, and SDS headers, incurring roughly 80–100 bytes of overhead per key before any data is stored. Grouping them into a single Hash under the `listpack` encoding stores the fields sequentially in a single contiguous block of memory, eliminating the per-field structural overhead. — In Redis, every top-level key carries significant structural overhead—a global hash table entry, a Redis Object (`robj`), and a Simple Dynamic String (SDS) wrapper. For small values, this metadata often consumes more memory than the actual payload. By consolidating related fields into a single Hash, the Hash itself is stored as a `listpack` (a highly compact, contiguous byte array) as long as it remains below the size thresholds. This allows you to pay the `robj` overhead only once per user, rather than three times, yielding massive memory savings at scale.

Q: An engineer runs `INFO memory` on a production Redis instance and notices that `used_memory_human` is 2.0G, but `used_memory_rss_human` is 3.5G, resulting in a `mem_fragmentation_ratio` of 1.75. What does this indicate, and what is the appropriate mitigation?

`jemalloc` has allocated memory in predefined bin sizes, and frequent key churn/resizing has left "holes" in the allocator's memory space that the OS cannot reclaim. The engineer should consider enabling active defragmentation (`activedefrag yes`). — The fragmentation ratio is the RSS (what the OS reports) divided by the used memory (what Redis has explicitly allocated). A ratio of 1.75 indicates severe fragmentation—Redis is holding 3.5GB of RAM from the OS to store only 2.0GB of actual data. This typically happens due to heavy key churn or structural resizing leaving un-compacted gaps in `jemalloc`'s memory bins. Enabling Redis 4.0+'s `activedefrag` allows Redis to continuously compact these allocations in the background and return memory to the OS.

Q: What happens to the memory usage of a Redis Hash if it grows from 100 fields to 500 fields, assuming the default Redis configuration?

Once the Hash exceeds the `hash-max-listpack-entries` threshold (default 128), its internal encoding upgrades from the compact `listpack` to a full `hashtable`. The memory usage will spike dramatically (often 4-5x per entry) due to the addition of pointers and dictEntry structures. — Redis optimizes small collections by storing them as highly compact `listpack` structures. However, a `listpack` requires O(N) linear scanning. To maintain performance as the collection grows, Redis imposes a threshold (by default, 128 items for Hashes). Once the Hash exceeds this limit, Redis permanently converts it to a standard pointer-based `hashtable`. While this preserves O(1) lookup times, the memory overhead per item increases massively due to the structural requirements of hash buckets, linked list pointers, and separate SDS string allocations.

Who this module is for: You know Redis is fast because it uses RAM, but you have never understood how it uses RAM — what a key actually costs, why two seemingly similar datasets can have wildly different memory footprints, or why your 100MB dataset grows to 300MB in Redis. This module covers the memory model, jemalloc, memory fragmentation, the object encoding hierarchy, and practical tools for diagnosing and reducing Redis memory usage.

How Redis Stores Data in Memory

When you run SET user:1001 "Jatin", Redis does not simply store the bytes "Jatin" in memory. It creates a chain of C structures:

The hash table entry — a pointer in the global keyspace hash table pointing to the key object
The key object (robj) — a Redis Object structure containing type, encoding, reference count, LRU clock, and a pointer to the actual data
The key string (SDS) — a Simple Dynamic String: 4 bytes of header (length, free space) + the key bytes + null terminator
The value object (robj) — same structure as the key object
The value data — the actual value bytes (or a pointer to a more complex structure like a listpack or skiplist)

For a key user:1001 (9 bytes) with value "Jatin" (5 bytes), the actual memory consumption is roughly:

Structure	Approximate Size
Hash table entry (two pointers)	16 bytes
Key robj	16 bytes
Key SDS header	4 bytes
Key bytes ("user:1001\0")	10 bytes
Value robj	16 bytes
Value SDS / int encoding	8–16 bytes
jemalloc overhead (rounding to bin size)	0–24 bytes

Total: roughly 80–100 bytes for a 14-byte key-value pair. The ratio of overhead to payload is high for small values. This is why key naming matters — short keys reduce overhead, and large values (like JSON blobs) amortize the per-key cost better.

Simple Dynamic Strings (SDS)

Redis stores all strings (keys and String-type values) as SDS — its own string implementation, not C's null-terminated strings.

SDS header (as of Redis 3.2+, the header size depends on string length):

// For strings ≤ 32 bytes:
struct sdshdr5 { ... }   // 1 byte header

// For strings ≤ 255 bytes:
struct sdshdr8 {
    uint8_t len;         // current length
    uint8_t alloc;       // allocated capacity (excluding header + null terminator)
    unsigned char flags; // header type
    char buf[];          // the actual bytes
};

// For strings > 255 bytes: sdshdr16, sdshdr32, sdshdr64

SDS allows O(1) length lookup (no scanning for null terminator), binary safety (can store null bytes in the middle), and pre-allocated capacity for append operations (reducing reallocation frequency).

The key insight: every SDS has at least 1 byte of overhead beyond the data itself, and jemalloc rounds allocations to power-of-2 bin sizes. A 9-byte key ("user:101") gets a 16-byte allocation (next power of 2 after 9 + 1 header byte = 10 bytes).

The Redis Object (robj)

Every Redis value is wrapped in an robj (Redis Object):

typedef struct redisObject {
    unsigned type:4;       // STRING, LIST, HASH, SET, ZSET, etc.
    unsigned encoding:4;   // the internal encoding (INT, EMBSTR, RAW, LISTPACK, etc.)
    unsigned lru:24;       // LRU clock for eviction
    int refcount;          // reference count
    void *ptr;             // pointer to the data, or the data itself for small ints
} robj;

This structure is 16 bytes on a 64-bit system.

Shared integers: Redis pre-allocates objects for integers 0–9999. SET counter 42 stores the integer 42 as a pointer to a pre-allocated shared object — no allocation needed. This is why OBJECT REFCOUNT counter returns a number > 1 for small integers (multiple keys may point to the same shared object).

jemalloc and Memory Fragmentation

Redis uses jemalloc as its memory allocator. jemalloc manages memory in size classes (bins): 8, 16, 32, 64, 128, 192, 256 bytes, etc. When Redis requests 10 bytes, jemalloc gives it a 16-byte slot. The 6 unused bytes are "internal fragmentation."

Memory fragmentation ratio:

text

INFO memory
→ mem_fragmentation_ratio: 1.52

mem_fragmentation_ratio = used_memory_rss / used_memory

used_memory — what Redis believes it is using (its allocations)
used_memory_rss — what the OS reports Redis is using (RSS = Resident Set Size)

Ratio	Interpretation
~1.0	Healthy — very little fragmentation
1.1–1.5	Acceptable — some fragmentation, normal for dynamic workloads
> 1.5	High fragmentation — wasted memory, consider defragmentation
< 1.0	Redis has swapped some data to disk (very bad — indicates memory pressure)

Causes of Fragmentation

Key churn — many keys created and deleted over time leaves holes in the allocator's free lists
Resizing data structures — Hash or List encoding upgrades (listpack → hashtable) free the old structure and allocate a new one; the old slot may not be immediately reusable
Long-running instances — fragmentation accumulates over time

Active Defragmentation

Redis 4.0+ includes an active defragmentation feature (activedefrag):

text

CONFIG SET activedefrag yes
CONFIG SET active-defrag-ignore-bytes 100mb    → don't defrag if fragmentation is < 100MB
CONFIG SET active-defrag-threshold-lower 10    → start defrag when fragmentation > 10%
CONFIG SET active-defrag-threshold-upper 100   → max defrag effort at 100% fragmentation

Active defragmentation runs in the background, moving live objects to compacted allocations. It adds some CPU overhead but can recover significant amounts of memory. Enable it if your mem_fragmentation_ratio consistently exceeds 1.5.

Object Encoding and Memory Cost

The OBJECT ENCODING command tells you which internal representation a key is using. Different encodings have dramatically different memory footprints.

String Encodings

text

127.0.0.1:6379> SET counter 42
127.0.0.1:6379> OBJECT ENCODING counter
"int"       ← stored as a 64-bit integer in the robj's ptr field — no heap allocation

127.0.0.1:6379> SET greeting "Hello"
127.0.0.1:6379> OBJECT ENCODING greeting
"embstr"    ← string ≤ 44 bytes: robj + SDS in a single allocation (one malloc call)

127.0.0.1:6379> SET essay "... (45+ bytes) ..."
127.0.0.1:6379> OBJECT ENCODING essay
"raw"       ← string > 44 bytes: robj + SDS in two separate allocations

embstr is the most memory-efficient string encoding: the robj and the SDS are allocated together in a single 64-byte allocation. This avoids a separate heap allocation and improves cache locality. Strings ≤ 44 bytes use embstr; longer strings use raw.

Hash Encodings

text

127.0.0.1:6379> HSET user:1001 name "Jatin"
127.0.0.1:6379> OBJECT ENCODING user:1001
"listpack"   ← ≤ 128 fields, each ≤ 64 bytes

# After crossing threshold:
127.0.0.1:6379> OBJECT ENCODING user:1001
"hashtable"  ← pointer-based hash table with linked list buckets

Memory cost:

listpack: ~11 bytes per field-value pair + small header overhead (~7 bytes)
hashtable: ~64 bytes per entry (two pointers for the hash table + two pointers for the linked list + the dictEntry struct + the key and value SDS strings)

A 10-field Hash in listpack encoding uses ~120 bytes. In hashtable encoding, the same 10 fields use ~700+ bytes. This is why keeping Hashes small (below the listpack threshold) matters enormously for memory efficiency.

List Encodings

text

"listpack"   → compact byte array (small lists)
"quicklist"  → doubly-linked list of listpack nodes (large lists)

A quicklist node is a listpack that stores up to list-max-listpack-size entries. The number of nodes is roughly total_elements / node_size. Each quicklist node has a header (~32 bytes) plus the listpack data.

Sorted Set Encodings

text

"listpack"   → compact, linear scan O(N) but zero pointer overhead for small sets
"skiplist"   → O(log N) with a separate hashtable; ~200 bytes per element overhead

A Sorted Set with 1,000 members in listpack encoding: ~55KB.
The same 1,000 members in skiplist encoding: ~250KB.
The encoding upgrade at the 129th member costs ~5x more memory.

Practical Memory Tools

DEBUG OBJECT

DEBUG OBJECT key

Returns: value at address (hex), serialized length in bytes (what RDB persistence would store), encoding, number of elements (for collections), and LRU time.

text

127.0.0.1:6379> DEBUG OBJECT user:1001
Value at:0x7f8b1c0040b0 refcount:1 encoding:listpack serializedlength:52 lru:8821634 lru_seconds_idle:3 type:hash

serializedlength is the compressed size — useful for estimating RDB file size.

MEMORY USAGE

MEMORY USAGE key [SAMPLES count]

Returns the exact number of bytes Redis uses for the key and its value, including the robj, SDS, and all internal structures. More accurate than estimating from encoding.

text

127.0.0.1:6379> MEMORY USAGE user:1001
(integer) 104

For collections (Hashes, Lists, Sets, Sorted Sets), SAMPLES controls how many elements are sampled to estimate total memory (since scanning all elements would be slow for large collections). Default is 5; use 0 for an exact calculation on small keys.

MEMORY DOCTOR

MEMORY DOCTOR

Returns a human-readable diagnosis of Redis's memory usage — whether there is high fragmentation, unusual RSS, or configuration issues.

INFO memory

Key fields for memory monitoring:

text

INFO memory

used_memory: 1520000000        → bytes Redis has allocated (what malloc reports)
used_memory_human: 1.42G
used_memory_rss: 2080000000    → bytes reported by OS (includes fragmentation)
used_memory_rss_human: 1.94G
mem_fragmentation_ratio: 1.37
used_memory_peak: 1600000000   → peak usage since server start
used_memory_peak_human: 1.49G
maxmemory: 2147483648
maxmemory_human: 2.00G
mem_allocator: jemalloc-5.3.0

Memory Optimization Strategies

1. Use the Right Encoding Thresholds

The default encoding thresholds are conservative. If your Hashes consistently have < 50 fields with values < 128 bytes, raising the listpack threshold keeps more data in the compact encoding:

text

hash-max-listpack-entries 256   # was 128
hash-max-listpack-value 128     # was 64

Do the same for Sets and Sorted Sets. Measure memory before and after with MEMORY USAGE.

2. Short Key Names

user:1001:profile (18 bytes) costs more RAM than u:1001:p (8 bytes). At 10 million keys, 10 bytes saved per key = 100MB. Balance readability vs memory. At small scale (< 1M keys), prioritize readability.

3. Use Hashes for Small Objects Instead of Top-Level Keys

Instead of:

text

SET user:1001:name "Jatin"
SET user:1001:email "j@example.com"
SET user:1001:role "engineer"

Use:

HSET user:1001 name "Jatin" email "j@example.com" role "engineer"

Three top-level keys each pay the full robj + SDS overhead. One Hash with three fields in listpack encoding uses significantly less memory because the listpack is a contiguous byte array with minimal per-field overhead.

This is the "use Hashes to store small objects" pattern — one of the most impactful Redis memory optimizations.

4. Set TTLs on Everything That Should Not Be Permanent

Orphaned keys without TTLs accumulate indefinitely. An audit with SCAN + TTL on a production instance often reveals thousands of keys that should have expired long ago. Set sensible TTLs on all cache and session keys.

5. Monitor Encoding Upgrades

When you add the 129th field to a Hash, it converts from listpack to hashtable and memory usage spikes 5x for that key. If you are storing per-user data in Hashes and some users have hundreds of fields, those keys are using hashtable encoding and consuming much more memory than typical users. Identify outliers with:

bash

redis-cli --scan --pattern "user:*" | while read key; do
  encoding=$(redis-cli OBJECT ENCODING "$key")
  size=$(redis-cli MEMORY USAGE "$key")
  echo "$key $encoding $size"
done | sort -k3 -rn | head -20

Summary

Every Redis key costs ~80–100 bytes of overhead regardless of value size — key naming matters for small values
Redis stores strings as SDS (Simple Dynamic Strings): length-prefixed, binary-safe, with jemalloc-managed allocation
Every value is wrapped in an robj (16 bytes): type, encoding, LRU clock, refcount, data pointer
jemalloc rounds allocations to bin sizes — small values pay internal fragmentation overhead
mem_fragmentation_ratio > 1.5 → enable activedefrag; ratio < 1.0 → data is swapping (critical)
MEMORY USAGE key gives exact per-key memory cost
OBJECT ENCODING key reveals the internal representation; compact encodings (listpack, intset, embstr, int) use dramatically less memory than their large-structure counterparts
Use Hashes to group related small values — listpack Hash encoding is far more memory-efficient than multiple top-level String keys
Set TTLs on all cache and session keys to prevent orphan key accumulation

Next: F-10 — Transactions: MULTI, EXEC, and Optimistic Locking with WATCH — how to execute a sequence of commands atomically and how to implement optimistic concurrency control for read-modify-write patterns.

Knowledge Check

A team is storing millions of user sessions. Initially, they use three separate keys per user: session:1001:id, session:1001:token, and session:1001:expires. A consultant advises them to switch to a single Redis Hash per user: HSET session:1001 id ... token ... expires .... Why does this change drastically reduce memory consumption?

An engineer runs INFO memory on a production Redis instance and notices that used_memory_human is 2.0G, but used_memory_rss_human is 3.5G, resulting in a mem_fragmentation_ratio of 1.75. What does this indicate, and what is the appropriate mitigation?

What happens to the memory usage of a Redis Hash if it grows from 100 fields to 500 fields, assuming the default Redis configuration?

Test your knowledge with more question sets

PreviousModule F-9: Streams: Append-Only Logs and Consumer Groups Next Module F-11: Transactions: MULTI, EXEC, and Optimistic Locking with WATCH

Discussion

Join the discussion

Loading comments...