Module P-1·20 min read

How BGSAVE uses fork() and copy-on-write, the RDB file format, snapshot scheduling with save directives, RDB compression, and the trade-off: data loss up to the last snapshot interval.

P-1 — RDB Snapshots: Point-in-Time Persistence

Who this module is for: You know Redis loses data on restart by default and that "persistence" is something you can configure, but you have never understood how it actually works, what the performance cost is, or when RDB is the right choice vs AOF. This module covers the RDB snapshot mechanism end-to-end, including the fork-and-copy-on-write internals that most documentation glosses over.


The Default: No Persistence

When you start Redis with default settings and no redis.conf, persistence is partially enabled via default save directives:

save 3600 1    → snapshot if at least 1 key changed in the last 3600 seconds
save 300 100   → snapshot if at least 100 keys changed in the last 300 seconds
save 60 10000  → snapshot if at least 10000 keys changed in the last 60 seconds

For a pure cache (Redis as a read-through cache in front of PostgreSQL), you should disable this:

save ""   → disable all automatic snapshots

For any use case where data must survive a restart (session store, rate limiter state, queues), you must configure persistence explicitly — either RDB, AOF, or both.


What RDB Is

RDB (Redis Database Backup) is a point-in-time snapshot of the entire Redis dataset, written to a single compact binary file on disk. By default, this file is named dump.rdb.

When Redis restarts, it reads dump.rdb and restores the entire dataset into memory. Startup time is proportional to the size of the dataset.

The guarantee: All data that existed at the moment the snapshot was taken survives a restart. Data written after the snapshot and before the crash is lost.

The trade-off: The maximum data loss equals the time between snapshots. With save 60 10000, you can lose up to 60 seconds of writes.


How BGSAVE Works: Fork and Copy-on-Write

This is the part that most tutorials skip, and it is the most important thing to understand about RDB.

Redis cannot stop serving requests while it writes the snapshot — that could take minutes for a large dataset. Instead, it uses the Unix fork() system call to create a child process that handles the snapshot while the parent continues serving clients.

Redis parent process (serving requests)
        │
        ├── BGSAVE triggered
        │
        └── fork()
              │
              ├── Parent process ──────────────────→ continues serving clients
              │   (still has all data in memory)      (writes go to parent's memory)
              │
              └── Child process ───────────────────→ writes entire dataset to disk
                  (sees a snapshot of parent's       (reads from shared pages)
                   memory at fork time)

Copy-on-Write (CoW)

After fork(), both the parent and child share the same physical memory pages. The OS uses copy-on-write semantics: when the parent modifies a page (because a client writes to Redis), the OS creates a private copy of that page for the parent. The child still sees the original page.

This means:

  • The child writes the snapshot based on the data as it existed at fork time — a consistent point-in-time view
  • The parent continues serving writes normally
  • Modified pages are duplicated in RAM — one version for the parent (new value) and one for the child (old value, being written to disk)

The memory spike: During a snapshot, Redis can temporarily use up to 2× its normal working set of RAM. If your dataset is 4GB and writes are heavy during the snapshot, Redis might use 6-7GB before the snapshot finishes and the child exits.

This is why you must size your Redis server's RAM to accommodate BGSAVE overhead. A rule of thumb: provision 1.5× your expected dataset size for a write-heavy Redis.

Fork Latency

fork() itself is a fast system call — it does not copy memory, only the page table. But on large datasets, the page table itself can be large. Redis may pause briefly (milliseconds) while the OS copies the parent's page table for the child.

With 10GB dataset and 2MB pages (default on Linux), the page table has ~5,000 entries. With 4KB pages (common default), it has ~2.5 million entries. Large page tables mean longer fork() latency.

To check fork latency:

INFO stats
→ latest_fork_usec: 512   ← last fork took 512 microseconds

Values above 10,000 microseconds (10ms) indicate a very large dataset or a system with memory pressure.


Triggering Snapshots

Automatic (via save directives)

# In redis.conf
save 3600 1
save 300 100
save 60 10000

Redis checks these conditions every second. If any threshold is met, BGSAVE is triggered automatically. Conditions are OR'd — meeting any one triggers a snapshot.

Manual

BGSAVE              → async: triggers background save, returns immediately
SAVE                → sync: blocks Redis until snapshot is complete (avoid in production)
BGSAVE SCHEDULE     → queue a BGSAVE when no other save is running

SAVE is a blocking command — no client can be served while it runs. Only use it when you need a guaranteed consistent snapshot before a maintenance operation and can tolerate the downtime.

On Shutdown

Redis calls BGSAVE automatically on graceful shutdown (SIGTERM). The data written during the shutdown window between the snapshot and the shutdown is lost. If Redis is killed (SIGKILL) or crashes, no shutdown snapshot is taken.


RDB File Format and Compression

The dump.rdb file is a compact binary format that stores:

  1. A header with the Redis version
  2. Database select markers
  3. Each key-value pair with:
    • Key TTL (if set)
    • Value type byte
    • Encoded value (type-specific encoding)
  4. A checksum (CRC64)

RDB uses LZF compression for large String values by default (rdbcompression yes). Compression reduces file size but adds CPU cost during save and load.

# In redis.conf
rdbcompression yes        → compress string objects (default)
rdbchecksum yes           → validate checksum on load (default)
dbfilename dump.rdb       → snapshot filename
dir /var/lib/redis        → directory for snapshot and AOF files

Checking RDB File Validity

bash
redis-check-rdb /var/lib/redis/dump.rdb

This verifies the RDB file's checksum and structural integrity — useful before restoring a backup.


Configuring RDB in Practice

Pure Cache (disable RDB)

save ""

Infrequent Snapshot (low write rate, tolerate data loss)

save 3600 1    → once an hour if anything changed

Frequent Snapshot (moderate write rate)

save 900 1     → every 15 minutes if at least 1 change
save 300 10    → every 5 minutes if at least 10 changes

Stop-on-Failure Behavior

stop-writes-on-bgsave-error yes   → default: refuse writes if last BGSAVE failed

This is a safety mechanism: if Redis cannot write the snapshot (disk full, permission error), it stops accepting writes to prevent data divergence between the in-memory state and the last successful snapshot. In production, monitor for BGSAVE errors — this config will silently start rejecting write commands.


Monitoring RDB

INFO persistence

Key fields:

rdb_changes_since_last_save: 1423     → writes since last snapshot
rdb_bgsave_in_progress: 0            → 1 if currently saving
rdb_last_save_time: 1717000000       → Unix timestamp of last successful save
rdb_last_bgsave_status: ok           → "ok" or "err"
rdb_last_bgsave_time_sec: 3          → seconds for last snapshot
rdb_current_bgsave_time_sec: -1      → seconds elapsed for in-progress save (-1 if none)
rdb_saves: 42                         → total successful saves since start
rdb_last_cow_size: 8388608           → copy-on-write memory used during last save (bytes)

rdb_last_cow_size tells you how much extra memory was needed for copy-on-write during the snapshot. If this number is approaching your free RAM, you have a memory sizing problem.


RDB vs AOF at a Glance

ConcernRDBAOF
Data loss on crashUp to snapshot intervalTypically ≤ 1 second (with everysec)
Recovery speedFast (load binary)Slow (replay all commands)
File sizeCompactGrows until rewrite
Disk I/O during normal operationOnly at snapshot intervalsContinuous (every command)
Performance impactShort fork pause, CoW memory spikeMinimal (append-only write)
Best forBackups, tolerable data lossNear-durability requirement

The full comparison and decision framework are in P-3. P-2 covers AOF in depth.


Summary

  • RDB creates a point-in-time snapshot of the entire dataset using fork() + copy-on-write
  • The child process writes the snapshot; the parent continues serving clients — no blocking
  • During BGSAVE, Redis can use up to 2× its dataset size in RAM due to copy-on-write page duplication
  • fork() itself may pause Redis for milliseconds on large datasets — monitor latest_fork_usec
  • Configure save directives to control automatic snapshot frequency; BGSAVE for manual snapshots
  • Maximum data loss = time since last successful snapshot
  • Monitor with INFO persistence: watch rdb_last_bgsave_status and rdb_last_cow_size
  • stop-writes-on-bgsave-error yes (default) stops all writes if BGSAVE fails — monitor for this in production

Next: P-2 — AOF: Append-Only File Mechanics and fsync Strategies — how Redis logs every write command to disk, the three fsync modes and their durability-vs-latency trade-offs, and how AOF rewrite prevents unbounded log growth.

© 2026 Jatin Jain Saraf (JJS). All rights reserved.