Module P-2·20 min read

How AOF logs every write command, the three appendfsync strategies (always/everysec/no) and their durability-vs-latency trade-offs, AOF rewrite to prevent unbounded log growth, and the hybrid RDB+AOF preamble format.

P-2 — AOF: Append-Only File Mechanics and fsync Strategies

Q: A financial application requires absolute data durability; no confirmed write can ever be lost, even if the server loses power a millisecond after acknowledging the write. The engineering team configures Redis with AOF enabled and `appendfsync always`. What is the primary operational consequence of this configuration?

Write performance will be severely bottlenecked by the physical latency of the storage medium. Every single write command must endure a blocking disk `fsync()` before returning to the client, typically limiting throughput to a few hundred operations per second on standard hardware. — While `appendfsync always` provides the highest level of durability, it effectively turns Redis from a high-speed in-memory datastore into a slow, disk-bound database. Because the event loop must wait for the OS to confirm that the data is physically committed to storage on every single command, the theoretical limit of Redis drops from ~100k ops/sec to roughly the IOPS limit of the disk (often 100-500 for standard disks). This is rarely the right choice for a system designed for speed.

Q: During an AOF rewrite (`BGREWRITEAOF`), how does Redis ensure that new write commands received from clients while the background rewrite is happening are not lost when the new, smaller AOF file replaces the old one?

Redis uses a "double buffer" approach. While the child process creates the new AOF file based on the memory snapshot, the parent process appends incoming writes to both the old AOF file and a special in-memory rewrite buffer. Once the child finishes, the parent flushes this rewrite buffer to the end of the new AOF file before atomically swapping them. — The AOF rewrite process is designed to be completely non-blocking for clients. Because the child process only has a static point-in-time snapshot of the data (via `fork()`), it cannot see writes that occur after the fork. To capture these, the parent process actively records new writes in an AOF rewrite buffer in RAM. When the child finishes its static dump, the parent synchronizes the new file by appending this buffer, ensuring continuous durability without data loss or downtime.

Q: A Redis instance is configured with `appendfsync everysec`. The server experiences a catastrophic OS-level kernel panic and crashes instantly. Based on this configuration, what is the maximum expected data loss?

Up to approximately 1 second of write operations. Commands executed in the last second may have been buffered in the OS page cache but not yet `fsync()`'d to durable storage by the background thread. — With `everysec`, Redis writes commands to the OS file buffer immediately but delegates the actual disk flush (`fsync`) to a background thread that runs once per second. If the Redis process itself crashes (but the OS stays up), data is safe in the OS buffer. However, if the OS crashes or loses power, any data in the buffer that hasn't hit the 1-second `fsync` window will be lost. This provides an excellent balance: near-absolute durability with almost no performance penalty.

Who this module is for: RDB's data loss window (up to the last snapshot interval) is too large for your use case. You need Redis to survive a crash with minimal data loss. AOF (Append-Only File) is the answer — but its three fsync strategies have very different durability and performance characteristics that you must understand before enabling it in production.

What AOF Is

AOF (Append-Only File) records every write command that Redis executes. When Redis restarts, it replays the AOF log to reconstruct the dataset. The result is the same in-memory state as before the crash, minus the commands that were not yet written to disk.

Unlike RDB (which snapshots the entire dataset periodically), AOF logs commands continuously. The trade-off:

More durable — data loss is bounded by how often the OS flushes the buffer to disk
Larger file — grows with every write until rewritten
Slower startup — must replay all commands vs loading a binary snapshot

Enabling AOF

text

# In redis.conf
appendonly yes
appendfilename "appendonly.aof"
appenddirname "appendonlydir"    → Redis 7.0+: AOF stored in a subdirectory
appendfsync everysec             → the critical setting (see below)

At runtime:

CONFIG SET appendonly yes

Redis 7.0 introduced multi-part AOF: the AOF directory contains a base file (an RDB snapshot or an existing AOF) plus incremental AOF files. This makes AOF rewrite safer and more efficient. On Redis < 7.0, a single appendonly.aof file is used.

The Three fsync Strategies

This is the most important decision in AOF configuration. The appendfsync setting controls how often Redis calls fsync() — the system call that tells the OS to flush its write buffer to durable storage.

Without fsync(), data written to disk may still be in the OS's page cache (RAM) and can be lost if the machine loses power before the OS writes it to storage. fsync() forces a flush.

appendfsync always

appendfsync always

Redis calls fsync() after every write command. Every SET, HSET, LPUSH — every command — is flushed to disk before returning OK to the client.

Durability: Maximum. If Redis crashes after returning OK, the command is on disk.
Performance: ~100–200 writes/second (bottlenecked by disk fsync latency). Unusable for write-heavy workloads on spinning disks; borderline on SSDs.
Use case: Financial transactions where every write must survive. Rarely appropriate for most Redis use cases.

appendfsync everysec (default recommended)

appendfsync everysec

Redis calls fsync() once per second in a background thread. Write commands are appended to the OS buffer immediately (fast), and the OS flushes to disk every second.

Durability: You can lose up to 1 second of writes. In practice, the loss window is usually < 1 second because the background fsync runs independently of write traffic.
Performance: Excellent — writes are buffered and only 1 fsync per second. Handles hundreds of thousands of writes/second on modern hardware.
Use case: Most production Redis deployments. The right default when you need durability but cannot sacrifice write throughput.

appendfsync no

appendfsync no

Redis never calls fsync() — it lets the OS decide when to flush write buffers to disk. On Linux, the OS typically flushes every 30 seconds, but this is not guaranteed.

Durability: Worst — you can lose up to 30+ seconds of writes on a system crash.
Performance: Highest — no fsync overhead at all.
Use case: When Redis is purely a cache (you can regenerate all data) and you want AOF for replay capability but are not concerned about data loss.

What Happens on Write

When a client sends SET mykey "value":

Redis executes the command in memory
Redis appends the command in RESP format to the AOF write buffer (in-process memory)
At the next appendfsync opportunity, the buffer is flushed to the OS and optionally fsync()'d
Redis returns OK to the client

The AOF write buffer is in Redis's process memory. If the process crashes before the buffer is flushed to the OS, the command is lost regardless of appendfsync setting. The buffer flush to the OS happens via write() — this happens on every command for always and everysec, and the fsync() determines whether the OS flushes to storage.

AOF Rewrite

Every write command is appended to the AOF file. Over time — especially if you SET and then DELETE many keys — the AOF file becomes large and contains redundant commands. Replaying it takes longer than necessary.

AOF rewrite compresses the log: Redis writes a new AOF file that produces the same in-memory state with the minimum number of commands. For example:

text

# Original AOF:
SET counter 0
INCR counter   → 1
INCR counter   → 2
INCR counter   → 3
DEL old_key
SET counter 10 (overwriting)

# Rewritten AOF:
SET counter 10

Only the current state matters for recovery — the history of how we got there is irrelevant.

How Rewrite Works (fork + double buffer)

AOF rewrite, like RDB, uses fork():

BGREWRITEAOF is called (manually or automatically)
Redis forks a child process
The child walks the in-memory dataset and writes the equivalent commands to a new AOF file
While the child writes, the parent continues serving clients — new write commands are appended to both the existing AOF file AND a rewrite buffer in memory
When the child finishes, the parent appends the rewrite buffer to the new AOF file (catching up the commands issued during the rewrite)
The new AOF file atomically replaces the old one

The double-buffer approach ensures no writes are lost during the rewrite. The rewrite buffer captures all new commands issued while the child was writing.

Automatic Rewrite Configuration

text

auto-aof-rewrite-percentage 100     → trigger rewrite when AOF is 100% bigger than after last rewrite
auto-aof-rewrite-min-size 64mb      → only trigger if AOF is at least 64MB

With defaults: after the first rewrite, Redis triggers another rewrite when the AOF has grown to 2× the post-rewrite size and is at least 64MB. This prevents constant rewrites on small datasets.

Manual Rewrite

BGREWRITEAOF   → trigger an async rewrite

Monitoring

text

INFO persistence

aof_enabled: 1
aof_rewrite_in_progress: 0
aof_rewrite_scheduled: 0
aof_last_rewrite_time_sec: 8         → seconds for last rewrite
aof_current_rewrite_time_sec: -1     → -1 if no rewrite in progress
aof_last_bgrewrite_status: ok
aof_last_write_status: ok
aof_last_cow_size: 4194304           → CoW memory used during last rewrite
aof_current_size: 134217728          → current AOF file size in bytes
aof_base_size: 67108864              → size after last rewrite

AOF Corruption and Recovery

If Redis crashes mid-write, the last command in the AOF may be truncated or corrupted. Redis handles this:

Truncation (most common): The last command is incomplete. On startup, Redis detects this with a "bad length" error. It truncates the file at the last complete command and continues loading.

text

# In redis.conf
aof-use-rdb-preamble yes   → (default in Redis 7) store RDB snapshot + incremental AOF
aof-load-truncated yes     → (default) truncate and continue on corrupt tail

Mid-stream corruption (rare): If corruption is in the middle of the file, Redis refuses to load. Use redis-check-aof to fix:

bash

redis-check-aof --fix appendonly.aof

This scans for the corruption point and truncates everything after it. You lose commands after the corrupt point but preserve everything before.

The RDB+AOF Hybrid Format

Redis 4.0+ supports a hybrid persistence format:

aof-use-rdb-preamble yes   → default in Redis 7

When AOF rewrite runs with this setting, the new AOF file starts with an RDB snapshot (compact binary) followed by incremental AOF commands since the snapshot. This gives:

Fast startup (load the RDB section quickly)
Near-durability (replay only the AOF section since the snapshot)
Smaller replay log (RDB captures the bulk of data)

This is the recommended configuration for most production deployments.

fsync and the OS Write Buffer

Understanding the OS layer clarifies the durability guarantees:

text

Redis process
    │
    ├── write() to file descriptor ──→ OS page cache (RAM)
    │                                  (not on storage yet)
    │
    └── fsync() ────────────────────→ OS flushes page cache to storage controller
                                        storage controller writes to physical media
                                        (SSD/NVMe: fast; HDD: slow; RAID with battery: fast)

appendfsync everysec means: data is in the OS page cache immediately after each Redis write, and on storage within ~1 second. A process crash (SIGSEGV, OOM kill) loses at most 1 second. A machine power loss loses what is in the OS page cache (up to 1 second).

A machine power loss with appendfsync always loses nothing — every command is on physical storage before OK is returned. But on a spinning disk, each fsync() takes ~5–10ms, limiting you to 100–200 write ops/second.

Choosing Between RDB, AOF, and Both

Requirement	Recommendation
Pure cache (can regenerate all data)	RDB disabled, AOF disabled
Tolerate data loss up to minutes	RDB only (infrequent snapshots)
Tolerate data loss up to ~1 second	AOF with `everysec`
Zero data loss	AOF with `always` (significant performance cost)
Balance durability and performance	RDB + AOF hybrid

The full decision framework is in P-3.

Summary

AOF records every write command and replays them on restart — more durable than RDB
appendfsync always — fsync after every command; maximum durability, minimum throughput
appendfsync everysec — fsync once per second; ≤ 1 second data loss; recommended for most use cases
appendfsync no — never fsync; OS decides (up to 30+ seconds); only for pure caches
AOF rewrite compresses the log using fork() + double buffer — the child writes, the parent captures new writes in a buffer, both are merged on completion
Auto-rewrite triggers when AOF grows to 2× post-rewrite size and exceeds auto-aof-rewrite-min-size
The RDB+AOF hybrid (aof-use-rdb-preamble yes) gives fast startup + near-durability — the recommended default
Monitor with INFO persistence: check aof_last_write_status, aof_current_size, aof_last_cow_size

Next: P-3 — Persistence Decision Framework: RDB vs AOF vs Both vs None — the decision matrix for choosing the right persistence configuration based on your data loss tolerance, write throughput, and recovery time requirements.

Knowledge Check

A financial application requires absolute data durability; no confirmed write can ever be lost, even if the server loses power a millisecond after acknowledging the write. The engineering team configures Redis with AOF enabled and appendfsync always. What is the primary operational consequence of this configuration?

During an AOF rewrite (BGREWRITEAOF), how does Redis ensure that new write commands received from clients while the background rewrite is happening are not lost when the new, smaller AOF file replaces the old one?

A Redis instance is configured with appendfsync everysec. The server experiences a catastrophic OS-level kernel panic and crashes instantly. Based on this configuration, what is the maximum expected data loss?

Test your knowledge with more question sets

PreviousModule P-1: RDB Snapshots: Point-in-Time Persistence Next Module P-3: Persistence Decision Framework: RDB vs AOF vs Both vs None

Discussion

Join the discussion

Loading comments...