Module P-11·20 min read

The INFO command section by section (server, clients, memory, stats, replication, keyspace), SLOWLOG for identifying slow commands, LATENCY HISTORY, MONITOR for live command tracing, and the 10 metrics every Redis dashboard must have.

P-11 — Monitoring and Observability

Who this module is for: You have a Redis instance in production but no visibility into what it is doing — what commands are slow, whether memory is healthy, how close you are to hitting limits. This module covers the full observability surface: INFO sections, SLOWLOG, LATENCY, MONITOR, and the 10 metrics every Redis dashboard must include.


The INFO Command

INFO is the primary observability tool. It returns a structured plaintext report across multiple sections. You can request all sections or a specific one:

INFO           → all sections
INFO server    → server metadata
INFO clients   → connected client counts
INFO memory    → memory usage and fragmentation
INFO stats     → command stats, hit/miss rates
INFO replication → primary/replica state
INFO cpu       → CPU time consumed
INFO keyspace  → per-database key counts and TTL stats
INFO persistence → RDB/AOF state
INFO commandstats → per-command call counts and latency
INFO latencystats → latency percentiles per command (Redis 7.0+)

INFO server

redis_version: 7.2.4
os: Linux 5.15.0-92 x86_64
arch_bits: 64
tcp_port: 6379
uptime_in_seconds: 864000    → 10 days of uptime
hz: 10                        → event loop frequency (affects expiry and other timers)
configured_hz: 10
aof_rewrites: 14
rdb_changes_since_last_save: 1423

uptime_in_seconds matters for fragmentation analysis — fragmentation grows over time and a very long uptime with high key churn warrants active defragmentation.

INFO clients

connected_clients: 48
blocked_clients: 2
tracking_clients: 0
clients_in_timeout_table: 0
maxclients: 10000
client_recent_max_input_buffer: 20480
client_recent_max_output_buffer: 0

Watch connected_clients approaching maxclients. Watch client_recent_max_output_buffer — a large output buffer means slow clients accumulating data faster than they read.

INFO stats — The Most Important Section

total_commands_processed: 948273841
total_connections_received: 1284723
rejected_connections: 0           → > 0 means you hit maxclients
expired_keys: 4829341             → total keys expired since start
evicted_keys: 0                   → should be 0; > 0 means memory pressure
keyspace_hits: 921847392          → commands that found their key
keyspace_misses: 26426449         → commands that returned nil
pubsub_channels: 3
pubsub_patterns: 1
instantaneous_ops_per_sec: 42841  → current throughput
instantaneous_input_kbps: 6284
instantaneous_output_kbps: 12847
total_net_input_bytes: 48293847192
total_net_output_bytes: 98473829384

Cache hit rate = keyspace_hits / (keyspace_hits + keyspace_misses)

For the example above: 921847392 / (921847392 + 26426449) = 97.2% — healthy.

Below 90%: investigate why. Causes: TTLs too short, maxmemory too small, cache warming not working, wrong key patterns.

evicted_keys > 0: Your cache is under memory pressure. Redis is actively deleting data to make room. Increase maxmemory or reduce your dataset.

rejected_connections > 0: You have hit maxclients. Increase the limit or fix connection leaks.

INFO replication

role: master
connected_slaves: 2
slave0: ip=10.0.1.50,port=6379,state=online,offset=84729384,lag=0
slave1: ip=10.0.1.51,port=6379,state=online,offset=84729382,lag=1
master_replid: a3f9c2d7e8b1...
master_repl_offset: 84729384
repl_backlog_active: 1
repl_backlog_size: 1048576    → 1MB replication backlog
repl_backlog_first_byte_offset: 83680808
repl_backlog_histlen: 1048576

lag = replication lag in seconds for each replica. A non-zero lag means the replica is behind.

repl_backlog_size — if a replica disconnects and reconnects with an offset that is no longer in the backlog, it requires a full resync (expensive). Increase repl-backlog-size if replicas frequently reconnect: CONFIG SET repl-backlog-size 64mb.

INFO keyspace

db0:keys=142883,expires=141204,avg_ttl=3591847

expires vs keys ratio — if expires << keys, most of your keys have no TTL. For a cache, this is a problem: memory fills up without natural eviction.

avg_ttl — average remaining TTL in milliseconds. If this is very short (< 60,000 = 60 seconds), keys are expiring rapidly and you may have high expiry overhead.

INFO commandstats

cmdstat_get:calls=18492834,usec=92464170,usec_per_call=5.00
cmdstat_set:calls=4293847,usec=17175388,usec_per_call=4.00
cmdstat_hgetall:calls=293847,usec=29384700,usec_per_call=100.00
cmdstat_zadd:calls=1293847,usec=5175388,usec_per_call=4.00

usec_per_call — microseconds per command call. High values for specific commands reveal which commands are slow. In the example, HGETALL at 100µs vs GET at 5µs — these HGETALL calls are expensive (likely large Hashes).

INFO latencystats (Redis 7.0+)

latency_percentiles_usec_get: p50=3,p99=12,p99.9=45
latency_percentiles_usec_hgetall: p50=8,p99=148,p99.9=2140

Per-command latency percentiles. p99.9 for HGETALL at 2,140µs (2ms) is a signal that some HGETALL calls are very expensive — likely on large Hashes that crossed the listpack→hashtable threshold.


SLOWLOG

SLOWLOG records commands that exceed a configurable latency threshold.

CONFIG SET slowlog-log-slower-than 10000   → log commands slower than 10ms (10,000µs)
CONFIG SET slowlog-max-len 128             → keep last 128 slow commands
SLOWLOG GET 10         → show last 10 slow commands
SLOWLOG LEN            → count of entries in the log
SLOWLOG RESET          → clear the log
127.0.0.1:6379> SLOWLOG GET 3
1) 1) (integer) 42          → log entry ID
   2) (integer) 1717000000  → Unix timestamp
   3) (integer) 14823       → execution time in microseconds (14.8ms)
   4) 1) "KEYS"             → the command
      2) "*"
   5) "10.0.1.100:52394"    → client address
   6) "myapp"               → client name (set with CLIENT SETNAME)

2) 1) (integer) 41
   2) (integer) 1717000000
   3) (integer) 12100
   4) 1) "HGETALL"
      2) "user:99999"       → this specific key is slow
   5) "10.0.1.100:52395"
   6) "myapp"

Common slow command findings:

  • KEYS * — scans all keys, blocks Redis. Replace with SCAN.
  • HGETALL large_hash — Hash in hashtable encoding with thousands of fields.
  • SMEMBERS large_set — returns all Set members at once. Use SSCAN.
  • SORT — sorts a List or Set; O(N+M log M). Computationally expensive.
  • LRANGE key 0 -1 — returns entire List. Cache long lists with pagination.

Set slowlog-log-slower-than 1000 (1ms) in development to catch all slow commands during development and testing. In production, use 10,000–20,000µs to avoid log noise.


LATENCY Monitoring

Redis has a built-in latency monitoring system that tracks event-level latency — not per-command, but per internal event type (fork, AOF flush, RDB save, etc.).

CONFIG SET latency-monitor-threshold 100   → track events with latency > 100ms
LATENCY LATEST                             → most recent latency sample per event
LATENCY HISTORY event-name                 → historical latency for an event
LATENCY RESET [event-name]                 → clear latency history
127.0.0.1:6379> LATENCY LATEST
1) 1) "aof-stat"
   2) (integer) 1717000000   → timestamp
   3) (integer) 120          → latency in ms
   4) (integer) 350          → max latency seen

Event names to watch:

  • fork — BGSAVE/BGREWRITEAOF fork latency (high = large dataset or memory pressure)
  • aof-stat — AOF write latency (high = disk I/O bottleneck)
  • rdb-* — RDB save events
  • command — command execution latency (aggregate)

MONITOR: Live Command Stream

MONITOR

MONITOR streams every command executed by every client in real time. It is invaluable for debugging unexpected behaviour ("what is sending KEYS * in production?") but adds 50%+ CPU overhead. Never leave MONITOR running in production.

127.0.0.1:6379> MONITOR
OK
1717000000.123456 [0 10.0.1.100:52394] "GET" "user:1001"
1717000000.124123 [0 10.0.1.101:52395] "HSET" "session:abc123" "lastSeen" "1717000000"
1717000000.124200 [0 10.0.1.100:52394] "SET" "cache:product:999" "..." "EX" "300"

Format: {unix_timestamp} [{db} {client_ip:port}] {command} {args...}

Use it briefly to identify which clients are issuing which commands, then disconnect immediately.


CLIENT LIST and CLIENT INFO

CLIENT LIST   → one line per connected client
CLIENT INFO   → info for the current client
id=42 addr=10.0.1.100:52394 laddr=10.0.0.10:6379 fd=23 name=myapp age=1234
cmd=get flags=N db=0 sub=0 psub=0 multi=-1 watch=0
qbuf=0 qbuf-free=32768 argv-mem=10 multi-mem=0
tot-mem=20512 rbs=16384 rbp=0 obl=0 oll=0 omem=0
events=r resp=2 uid=0 user=default library-name=ioredis library-ver=5.3.3

Key fields:

  • cmd — last command issued by this client
  • age — seconds since connection was established
  • sub — number of channels subscribed
  • omem — output buffer memory (large = slow client)
  • flagsb = blocked (BLPOP), S = subscriber

Identify stuck clients: CLIENT LIST + filter for cmd=blpop with high age values.


The 10 Metrics Every Redis Dashboard Must Include

#MetricSourceAlert Threshold
1Cache hit ratekeyspace_hits / (hits + misses)< 90%
2Evicted keys/secevicted_keys delta> 0
3Memory fragmentation ratiomem_fragmentation_ratio> 1.5 or < 1.0
4Memory used / maxmemoryused_memory / maxmemory> 80%
5Connected clientsconnected_clients> 80% of maxclients
6Ops per secondinstantaneous_ops_per_secBaseline ± 3σ
7Replication lagslave.lag (INFO replication)> 5 seconds
8Slow commandsSLOWLOG LEN deltaAny increase
9Last BGSAVE statusrdb_last_bgsave_statuserr
10Rejected connectionsrejected_connections delta> 0

Export these metrics from INFO every 15–60 seconds to your monitoring system (Prometheus via redis_exporter, Datadog, CloudWatch, etc.).


redis-cli Monitoring Shortcuts

bash
# Live stats (refreshes every second) redis-cli --stat # Live latency monitoring redis-cli --latency redis-cli --latency-history -i 5 # sample every 5 seconds # Live memory usage redis-cli --memkeys # memory usage per key pattern (sampling) # Count keys matching a pattern redis-cli --scan --pattern "session:*" | wc -l # Big keys scan (find top memory consumers) redis-cli --bigkeys

redis-cli --bigkeys scans the entire keyspace using SCAN and samples key sizes — it reports the largest key per type. Safe to run on production (uses cursor-based scan, not blocking KEYS *).


Summary

  • INFO is the starting point — use INFO stats for throughput and hit rate, INFO memory for memory health, INFO replication for lag, INFO keyspace for key distribution
  • Cache hit rate (keyspace_hits / total) should be > 90% — below this, investigate TTLs, eviction, and cache warming
  • evicted_keys > 0 means memory pressure — increase maxmemory or reduce dataset
  • SLOWLOG GET reveals expensive commands — the most common findings: KEYS *, HGETALL on large hashes, SORT
  • LATENCY LATEST / LATENCY HISTORY tracks internal event latency (fork, AOF flush, RDB save)
  • MONITOR streams live commands — invaluable for debugging, catastrophic if left running in production
  • CLIENT LIST identifies slow/stuck clients by output buffer size and command age
  • Export INFO metrics every 15–60 seconds to your monitoring system; build dashboards around the 10 core metrics

Next: P-12 — Security: ACLs, TLS, and Network Hardening — per-user command restrictions, TLS for in-transit encryption, bind address configuration, and the most common Redis security misconfigurations.

© 2026 Jatin Jain Saraf (JJS). All rights reserved.