How AOF logs every write command, the three appendfsync strategies (always/everysec/no) and their durability-vs-latency trade-offs, AOF rewrite to prevent unbounded log growth, and the hybrid RDB+AOF preamble format.
P-2 — AOF: Append-Only File Mechanics and fsync Strategies
Who this module is for: RDB's data loss window (up to the last snapshot interval) is too large for your use case. You need Redis to survive a crash with minimal data loss. AOF (Append-Only File) is the answer — but its three fsync strategies have very different durability and performance characteristics that you must understand before enabling it in production.
What AOF Is
AOF (Append-Only File) records every write command that Redis executes. When Redis restarts, it replays the AOF log to reconstruct the dataset. The result is the same in-memory state as before the crash, minus the commands that were not yet written to disk.
Unlike RDB (which snapshots the entire dataset periodically), AOF logs commands continuously. The trade-off:
- More durable — data loss is bounded by how often the OS flushes the buffer to disk
- Larger file — grows with every write until rewritten
- Slower startup — must replay all commands vs loading a binary snapshot
Enabling AOF
# In redis.conf
appendonly yes
appendfilename "appendonly.aof"
appenddirname "appendonlydir" → Redis 7.0+: AOF stored in a subdirectory
appendfsync everysec → the critical setting (see below)
At runtime:
CONFIG SET appendonly yes
Redis 7.0 introduced multi-part AOF: the AOF directory contains a base file (an RDB snapshot or an existing AOF) plus incremental AOF files. This makes AOF rewrite safer and more efficient. On Redis < 7.0, a single appendonly.aof file is used.
The Three fsync Strategies
This is the most important decision in AOF configuration. The appendfsync setting controls how often Redis calls fsync() — the system call that tells the OS to flush its write buffer to durable storage.
Without fsync(), data written to disk may still be in the OS's page cache (RAM) and can be lost if the machine loses power before the OS writes it to storage. fsync() forces a flush.
appendfsync always
appendfsync always
Redis calls fsync() after every write command. Every SET, HSET, LPUSH — every command — is flushed to disk before returning OK to the client.
Durability: Maximum. If Redis crashes after returning OK, the command is on disk.
Performance: ~100–200 writes/second (bottlenecked by disk fsync latency). Unusable for write-heavy workloads on spinning disks; borderline on SSDs.
Use case: Financial transactions where every write must survive. Rarely appropriate for most Redis use cases.
appendfsync everysec (default recommended)
appendfsync everysec
Redis calls fsync() once per second in a background thread. Write commands are appended to the OS buffer immediately (fast), and the OS flushes to disk every second.
Durability: You can lose up to 1 second of writes. In practice, the loss window is usually < 1 second because the background fsync runs independently of write traffic.
Performance: Excellent — writes are buffered and only 1 fsync per second. Handles hundreds of thousands of writes/second on modern hardware.
Use case: Most production Redis deployments. The right default when you need durability but cannot sacrifice write throughput.
appendfsync no
appendfsync no
Redis never calls fsync() — it lets the OS decide when to flush write buffers to disk. On Linux, the OS typically flushes every 30 seconds, but this is not guaranteed.
Durability: Worst — you can lose up to 30+ seconds of writes on a system crash.
Performance: Highest — no fsync overhead at all.
Use case: When Redis is purely a cache (you can regenerate all data) and you want AOF for replay capability but are not concerned about data loss.
What Happens on Write
When a client sends SET mykey "value":
- Redis executes the command in memory
- Redis appends the command in RESP format to the AOF write buffer (in-process memory)
- At the next
appendfsyncopportunity, the buffer is flushed to the OS and optionallyfsync()'d - Redis returns
OKto the client
The AOF write buffer is in Redis's process memory. If the process crashes before the buffer is flushed to the OS, the command is lost regardless of appendfsync setting. The buffer flush to the OS happens via write() — this happens on every command for always and everysec, and the fsync() determines whether the OS flushes to storage.
AOF Rewrite
Every write command is appended to the AOF file. Over time — especially if you SET and then DELETE many keys — the AOF file becomes large and contains redundant commands. Replaying it takes longer than necessary.
AOF rewrite compresses the log: Redis writes a new AOF file that produces the same in-memory state with the minimum number of commands. For example:
# Original AOF:
SET counter 0
INCR counter → 1
INCR counter → 2
INCR counter → 3
DEL old_key
SET counter 10 (overwriting)
# Rewritten AOF:
SET counter 10
Only the current state matters for recovery — the history of how we got there is irrelevant.
How Rewrite Works (fork + double buffer)
AOF rewrite, like RDB, uses fork():
BGREWRITEAOFis called (manually or automatically)- Redis forks a child process
- The child walks the in-memory dataset and writes the equivalent commands to a new AOF file
- While the child writes, the parent continues serving clients — new write commands are appended to both the existing AOF file AND a rewrite buffer in memory
- When the child finishes, the parent appends the rewrite buffer to the new AOF file (catching up the commands issued during the rewrite)
- The new AOF file atomically replaces the old one
The double-buffer approach ensures no writes are lost during the rewrite. The rewrite buffer captures all new commands issued while the child was writing.
Automatic Rewrite Configuration
auto-aof-rewrite-percentage 100 → trigger rewrite when AOF is 100% bigger than after last rewrite
auto-aof-rewrite-min-size 64mb → only trigger if AOF is at least 64MB
With defaults: after the first rewrite, Redis triggers another rewrite when the AOF has grown to 2× the post-rewrite size and is at least 64MB. This prevents constant rewrites on small datasets.
Manual Rewrite
BGREWRITEAOF → trigger an async rewrite
Monitoring
INFO persistence
aof_enabled: 1
aof_rewrite_in_progress: 0
aof_rewrite_scheduled: 0
aof_last_rewrite_time_sec: 8 → seconds for last rewrite
aof_current_rewrite_time_sec: -1 → -1 if no rewrite in progress
aof_last_bgrewrite_status: ok
aof_last_write_status: ok
aof_last_cow_size: 4194304 → CoW memory used during last rewrite
aof_current_size: 134217728 → current AOF file size in bytes
aof_base_size: 67108864 → size after last rewrite
AOF Corruption and Recovery
If Redis crashes mid-write, the last command in the AOF may be truncated or corrupted. Redis handles this:
Truncation (most common): The last command is incomplete. On startup, Redis detects this with a "bad length" error. It truncates the file at the last complete command and continues loading.
# In redis.conf
aof-use-rdb-preamble yes → (default in Redis 7) store RDB snapshot + incremental AOF
aof-load-truncated yes → (default) truncate and continue on corrupt tail
Mid-stream corruption (rare): If corruption is in the middle of the file, Redis refuses to load. Use redis-check-aof to fix:
bashredis-check-aof --fix appendonly.aof
This scans for the corruption point and truncates everything after it. You lose commands after the corrupt point but preserve everything before.
The RDB+AOF Hybrid Format
Redis 4.0+ supports a hybrid persistence format:
aof-use-rdb-preamble yes → default in Redis 7
When AOF rewrite runs with this setting, the new AOF file starts with an RDB snapshot (compact binary) followed by incremental AOF commands since the snapshot. This gives:
- Fast startup (load the RDB section quickly)
- Near-durability (replay only the AOF section since the snapshot)
- Smaller replay log (RDB captures the bulk of data)
This is the recommended configuration for most production deployments.
fsync and the OS Write Buffer
Understanding the OS layer clarifies the durability guarantees:
Redis process
│
├── write() to file descriptor ──→ OS page cache (RAM)
│ (not on storage yet)
│
└── fsync() ────────────────────→ OS flushes page cache to storage controller
storage controller writes to physical media
(SSD/NVMe: fast; HDD: slow; RAID with battery: fast)
appendfsync everysec means: data is in the OS page cache immediately after each Redis write, and on storage within ~1 second. A process crash (SIGSEGV, OOM kill) loses at most 1 second. A machine power loss loses what is in the OS page cache (up to 1 second).
A machine power loss with appendfsync always loses nothing — every command is on physical storage before OK is returned. But on a spinning disk, each fsync() takes ~5–10ms, limiting you to 100–200 write ops/second.
Choosing Between RDB, AOF, and Both
| Requirement | Recommendation |
|---|---|
| Pure cache (can regenerate all data) | RDB disabled, AOF disabled |
| Tolerate data loss up to minutes | RDB only (infrequent snapshots) |
| Tolerate data loss up to ~1 second | AOF with everysec |
| Zero data loss | AOF with always (significant performance cost) |
| Balance durability and performance | RDB + AOF hybrid |
The full decision framework is in P-3.
Summary
- AOF records every write command and replays them on restart — more durable than RDB
appendfsync always— fsync after every command; maximum durability, minimum throughputappendfsync everysec— fsync once per second; ≤ 1 second data loss; recommended for most use casesappendfsync no— never fsync; OS decides (up to 30+ seconds); only for pure caches- AOF rewrite compresses the log using
fork()+ double buffer — the child writes, the parent captures new writes in a buffer, both are merged on completion - Auto-rewrite triggers when AOF grows to 2× post-rewrite size and exceeds
auto-aof-rewrite-min-size - The RDB+AOF hybrid (
aof-use-rdb-preamble yes) gives fast startup + near-durability — the recommended default - Monitor with
INFO persistence: checkaof_last_write_status,aof_current_size,aof_last_cow_size
Next: P-3 — Persistence Decision Framework: RDB vs AOF vs Both vs None — the decision matrix for choosing the right persistence configuration based on your data loss tolerance, write throughput, and recovery time requirements.