Module A-8·22 min read

Sentinel as an independent high-availability process, subjective down vs objective down, the failover election sequence, min-replicas-to-write for split-brain prevention, and what Sentinel cannot protect against.

A-8 — Redis Sentinel: Quorum, Failover, and Split-Brain Prevention

Who this module is for: You want Redis to automatically recover from a primary failure — promoting a replica and reconfiguring clients — without manual intervention. Redis Sentinel is the high-availability solution for single-node (non-Cluster) Redis deployments. This module covers the Sentinel model, the failover sequence, split-brain prevention, and Sentinel's limitations.


What Sentinel Is

Redis Sentinel is a separate process (not part of Redis itself) that monitors a Redis primary and its replicas. When the primary fails, Sentinel:

  1. Detects the failure (agrees with other Sentinels that the primary is down)
  2. Elects a leader Sentinel to run the failover
  3. Selects the best replica to promote
  4. Promotes the chosen replica to primary
  5. Configures other replicas to replicate from the new primary
  6. Notifies clients of the new primary address

Sentinel is not a proxy — it does not route traffic. It is a monitoring and orchestration layer that clients query to discover the current primary address.


Deployment Topology

A minimum Sentinel setup requires 3 Sentinel processes on separate machines. An odd number is required for quorum.

Machine 1: Redis Primary + Sentinel 1
Machine 2: Redis Replica + Sentinel 2
Machine 3: Redis Replica + Sentinel 3

Clients connect to any Sentinel to discover the current primary's IP and port. They then connect directly to the primary for all operations.


Sentinel Configuration

# sentinel.conf (same format for all Sentinel instances)
sentinel monitor mymaster 10.0.1.50 6379 2
# Name:     mymaster
# Primary:  10.0.1.50:6379
# Quorum:   2 (how many Sentinels must agree the primary is down before failover)

sentinel auth-pass mymaster your-primary-password
sentinel down-after-milliseconds mymaster 5000
# Mark primary as "subjectively down" if unreachable for 5 seconds

sentinel failover-timeout mymaster 60000
# Maximum time to complete a failover (60 seconds)

sentinel parallel-syncs mymaster 1
# How many replicas can sync from new primary simultaneously during failover
# 1 = replicas sync one at a time (slower failover but less load spike)

The Failure Detection Sequence

Step 1: Subjective Down (SDOWN)

A single Sentinel marks the primary as "subjectively down" if it cannot reach the primary within down-after-milliseconds. This is one Sentinel's opinion — a network blip between just that Sentinel and the primary would cause an SDOWN that does not represent a real failure.

Sentinel 1: PING to primary... timeout (5 seconds)
Sentinel 1: Primary is SDOWN (subjectively down — my opinion only)

Step 2: Objective Down (ODOWN)

A Sentinel queries other Sentinels: "Do you also think the primary is down?" If at least quorum Sentinels agree, the primary is declared "objectively down" (ODOWN) — a real failure.

Sentinel 1 → Sentinel 2: "Is mymaster down?" → Yes (SDOWN)
Sentinel 1 → Sentinel 3: "Is mymaster down?" → Yes (SDOWN)
Sentinel 1: Quorum reached (2/3) → primary is ODOWN

With quorum 2: at least 2 of 3 Sentinels must agree. This prevents a single Sentinel's network issue from triggering an unnecessary failover.

Step 3: Leader Election

One Sentinel must be elected to lead the failover. Sentinel uses a Raft-like election: each Sentinel requests votes from others. The first to receive a majority becomes the failover leader.

Step 4: Replica Selection

The leader Sentinel chooses which replica to promote. Selection criteria (in order of preference):

  1. Replica with the lowest slave-priority (configured as replica-priority in replica's redis.conf)
  2. Replica with the smallest replication lag (most up-to-date data)
  3. Replica with the smallest Run ID (lexicographically) as tiebreaker
sentinel slave-priority: lower is preferred for promotion
# replica-priority 100 (default)
# replica-priority 0 means "never promote this replica" (e.g., replica used for backups)

Step 5: Failover Execution

1. Leader Sentinel sends REPLICAOF NO ONE to the chosen replica → it becomes primary
2. Leader Sentinel configures remaining replicas: REPLICAOF {new-primary-ip} {port}
3. Leader Sentinel updates its own configuration with the new primary address
4. Other Sentinels update their configuration
5. Sentinel publishes +switch-master event on the __sentinel__:hello channel

Step 6: Client Notification

Clients that use a Sentinel-aware client library (ioredis with sentinels config, Jedis with Sentinel support, etc.) subscribe to the __sentinel__:hello channel or periodically query Sentinel. When +switch-master fires, the client reconnects to the new primary address.


Failover Duration

A typical Sentinel failover takes:

  • down-after-milliseconds (5 seconds default) to detect SDOWN
  • ~1 second for ODOWN consensus
  • ~1 second for leader election
  • ~2–5 seconds for replica promotion and reconfiguration

Total: ~10 seconds of write downtime with default settings.

To reduce failover time: lower down-after-milliseconds (at the risk of false positives from brief network blips).


Split-Brain Prevention with min-replicas-to-write

Consider a network partition that isolates the primary from Sentinels but not from some clients:

Before partition:
  [Primary] ← clients → [App servers]
      ↕ replication
  [Replica 1][Replica 2]
  [Sentinel 1][Sentinel 2][Sentinel 3]

During partition:
  [Primary] ← clients → [App servers]  ← isolated from Sentinels and replicas

  [Replica 1][Replica 2]
  [Sentinel 1][Sentinel 2][Sentinel 3]

Sentinels cannot reach the old primary → ODOWN → failover → Replica 1 promoted to new primary.

Meanwhile, the old primary is still accepting writes from the clients (they can still reach it). When the partition heals, the old primary reconnects as a replica of the new primary and loses all writes it accepted during the partition.

Prevention: min-replicas-to-write and min-replicas-max-lag on the primary:

# Primary redis.conf:
min-replicas-to-write 1
min-replicas-max-lag 10

During the partition, the old primary cannot reach any replica. After 10 seconds, it stops accepting writes. Clients receive errors instead of silently losing data.


ioredis Sentinel Client

typescript
const redis = new Redis({ sentinels: [ { host: '10.0.1.50', port: 26379 }, { host: '10.0.1.51', port: 26379 }, { host: '10.0.1.52', port: 26379 }, ], name: 'mymaster', // must match sentinel.conf "monitor" name password: 'primary-password', sentinelPassword: 'sentinel-password', // if sentinels require AUTH role: 'master', // 'master' or 'slave' (for read-from-replica) }); // ioredis automatically queries Sentinels to find the current primary // and reconnects on failover events (+switch-master)

For read replicas:

typescript
const readRedis = new Redis({ sentinels: [/* ... */], name: 'mymaster', role: 'slave', // connects to a random replica });

Sentinel CLI Commands

bash
# Connect to a Sentinel redis-cli -h 10.0.1.50 -p 26379 # Query current master SENTINEL get-master-addr-by-name mymaster 1) "10.0.1.50" 2) "6379" # List all monitored masters SENTINEL masters # List replicas for a master SENTINEL replicas mymaster # List other Sentinels SENTINEL sentinels mymaster # Check Sentinel status SENTINEL ckquorum mymaster → OK 3 usable Sentinels. Quorum and failover authorization can be reached # Trigger a manual failover (for testing) SENTINEL failover mymaster

Sentinel Limitations

What Sentinel is not:

  • It is not a proxy — clients connect directly to the primary, not through Sentinel
  • It does not provide horizontal scaling — all writes go to one primary
  • It does not protect against data loss during the replication lag window
  • It cannot provide fencing tokens for distributed locking

Sentinel does not prevent data loss: If the primary fails before replication completes, writes in the lag window are lost when a replica is promoted. min-replicas-to-write reduces (but does not eliminate) this window.

Sentinel requires client support: Clients must be Sentinel-aware (know to query Sentinel for the primary address) or use a proxy (envoy, Twemproxy) that handles redirection. A client hardcoded to the primary's IP will not automatically reconnect after failover.


Sentinel vs Cluster

ConcernSentinelCluster
Use caseSingle-node Redis HAHorizontal scaling across nodes
Data distributionAll data on one primarySharded across 16,384 slots
Write throughputLimited to one nodeScales with node count
FailoverAutomatic (via Sentinel)Automatic (built-in)
ComplexityModerateHigher
Client requirementSentinel-aware clientCluster-aware client
Multi-key opsUnrestrictedKeys must be on same slot

Use Sentinel when your dataset fits on a single Redis node and you need automatic failover without the operational complexity of Cluster. Use Cluster when you need horizontal scaling beyond what a single node can provide.


Summary

  • Sentinel is a separate process that monitors Redis primary + replicas and orchestrates automatic failover
  • SDOWN = one Sentinel's opinion; ODOWN = quorum agreement → triggers failover
  • Failover sequence: detect ODOWN → elect leader Sentinel → select best replica → promote → reconfigure others → notify clients
  • Configure quorum 2 with 3 Sentinels — majority agreement prevents false failovers from single-node network issues
  • min-replicas-to-write 1 + min-replicas-max-lag 10 on the primary prevents split-brain data loss during network partition
  • ioredis Sentinel client automatically discovers and reconnects to the new primary after failover
  • Sentinel does not eliminate data loss — writes in the replication lag window are lost on primary failure
  • Use Sentinel for single-node HA; use Cluster for horizontal scaling

Next: A-9 — Redis Cluster: Hash Slots and Data Distribution — the 16,384 hash slot model, key routing, MOVED vs ASK redirections, and the constraints multi-key commands impose in a Cluster.

© 2026 Jatin Jain Saraf (JJS). All rights reserved.