The INFO command section by section (server, clients, memory, stats, replication, keyspace), SLOWLOG for identifying slow commands, LATENCY HISTORY, MONITOR for live command tracing, and the 10 metrics every Redis dashboard must have.
P-11 — Monitoring and Observability
Who this module is for: You have a Redis instance in production but no visibility into what it is doing — what commands are slow, whether memory is healthy, how close you are to hitting limits. This module covers the full observability surface: INFO sections, SLOWLOG, LATENCY, MONITOR, and the 10 metrics every Redis dashboard must include.
The INFO Command
INFO is the primary observability tool. It returns a structured plaintext report across multiple sections. You can request all sections or a specific one:
INFO → all sections
INFO server → server metadata
INFO clients → connected client counts
INFO memory → memory usage and fragmentation
INFO stats → command stats, hit/miss rates
INFO replication → primary/replica state
INFO cpu → CPU time consumed
INFO keyspace → per-database key counts and TTL stats
INFO persistence → RDB/AOF state
INFO commandstats → per-command call counts and latency
INFO latencystats → latency percentiles per command (Redis 7.0+)
INFO server
redis_version: 7.2.4
os: Linux 5.15.0-92 x86_64
arch_bits: 64
tcp_port: 6379
uptime_in_seconds: 864000 → 10 days of uptime
hz: 10 → event loop frequency (affects expiry and other timers)
configured_hz: 10
aof_rewrites: 14
rdb_changes_since_last_save: 1423
uptime_in_seconds matters for fragmentation analysis — fragmentation grows over time and a very long uptime with high key churn warrants active defragmentation.
INFO clients
connected_clients: 48
blocked_clients: 2
tracking_clients: 0
clients_in_timeout_table: 0
maxclients: 10000
client_recent_max_input_buffer: 20480
client_recent_max_output_buffer: 0
Watch connected_clients approaching maxclients. Watch client_recent_max_output_buffer — a large output buffer means slow clients accumulating data faster than they read.
INFO stats — The Most Important Section
total_commands_processed: 948273841
total_connections_received: 1284723
rejected_connections: 0 → > 0 means you hit maxclients
expired_keys: 4829341 → total keys expired since start
evicted_keys: 0 → should be 0; > 0 means memory pressure
keyspace_hits: 921847392 → commands that found their key
keyspace_misses: 26426449 → commands that returned nil
pubsub_channels: 3
pubsub_patterns: 1
instantaneous_ops_per_sec: 42841 → current throughput
instantaneous_input_kbps: 6284
instantaneous_output_kbps: 12847
total_net_input_bytes: 48293847192
total_net_output_bytes: 98473829384
Cache hit rate = keyspace_hits / (keyspace_hits + keyspace_misses)
For the example above: 921847392 / (921847392 + 26426449) = 97.2% — healthy.
Below 90%: investigate why. Causes: TTLs too short, maxmemory too small, cache warming not working, wrong key patterns.
evicted_keys > 0: Your cache is under memory pressure. Redis is actively deleting data to make room. Increase maxmemory or reduce your dataset.
rejected_connections > 0: You have hit maxclients. Increase the limit or fix connection leaks.
INFO replication
role: master
connected_slaves: 2
slave0: ip=10.0.1.50,port=6379,state=online,offset=84729384,lag=0
slave1: ip=10.0.1.51,port=6379,state=online,offset=84729382,lag=1
master_replid: a3f9c2d7e8b1...
master_repl_offset: 84729384
repl_backlog_active: 1
repl_backlog_size: 1048576 → 1MB replication backlog
repl_backlog_first_byte_offset: 83680808
repl_backlog_histlen: 1048576
lag = replication lag in seconds for each replica. A non-zero lag means the replica is behind.
repl_backlog_size — if a replica disconnects and reconnects with an offset that is no longer in the backlog, it requires a full resync (expensive). Increase repl-backlog-size if replicas frequently reconnect: CONFIG SET repl-backlog-size 64mb.
INFO keyspace
db0:keys=142883,expires=141204,avg_ttl=3591847
expires vs keys ratio — if expires << keys, most of your keys have no TTL. For a cache, this is a problem: memory fills up without natural eviction.
avg_ttl — average remaining TTL in milliseconds. If this is very short (< 60,000 = 60 seconds), keys are expiring rapidly and you may have high expiry overhead.
INFO commandstats
cmdstat_get:calls=18492834,usec=92464170,usec_per_call=5.00
cmdstat_set:calls=4293847,usec=17175388,usec_per_call=4.00
cmdstat_hgetall:calls=293847,usec=29384700,usec_per_call=100.00
cmdstat_zadd:calls=1293847,usec=5175388,usec_per_call=4.00
usec_per_call — microseconds per command call. High values for specific commands reveal which commands are slow. In the example, HGETALL at 100µs vs GET at 5µs — these HGETALL calls are expensive (likely large Hashes).
INFO latencystats (Redis 7.0+)
latency_percentiles_usec_get: p50=3,p99=12,p99.9=45
latency_percentiles_usec_hgetall: p50=8,p99=148,p99.9=2140
Per-command latency percentiles. p99.9 for HGETALL at 2,140µs (2ms) is a signal that some HGETALL calls are very expensive — likely on large Hashes that crossed the listpack→hashtable threshold.
SLOWLOG
SLOWLOG records commands that exceed a configurable latency threshold.
CONFIG SET slowlog-log-slower-than 10000 → log commands slower than 10ms (10,000µs)
CONFIG SET slowlog-max-len 128 → keep last 128 slow commands
SLOWLOG GET 10 → show last 10 slow commands
SLOWLOG LEN → count of entries in the log
SLOWLOG RESET → clear the log
127.0.0.1:6379> SLOWLOG GET 3
1) 1) (integer) 42 → log entry ID
2) (integer) 1717000000 → Unix timestamp
3) (integer) 14823 → execution time in microseconds (14.8ms)
4) 1) "KEYS" → the command
2) "*"
5) "10.0.1.100:52394" → client address
6) "myapp" → client name (set with CLIENT SETNAME)
2) 1) (integer) 41
2) (integer) 1717000000
3) (integer) 12100
4) 1) "HGETALL"
2) "user:99999" → this specific key is slow
5) "10.0.1.100:52395"
6) "myapp"
Common slow command findings:
KEYS *— scans all keys, blocks Redis. Replace withSCAN.HGETALL large_hash— Hash in hashtable encoding with thousands of fields.SMEMBERS large_set— returns all Set members at once. UseSSCAN.SORT— sorts a List or Set; O(N+M log M). Computationally expensive.LRANGE key 0 -1— returns entire List. Cache long lists with pagination.
Set slowlog-log-slower-than 1000 (1ms) in development to catch all slow commands during development and testing. In production, use 10,000–20,000µs to avoid log noise.
LATENCY Monitoring
Redis has a built-in latency monitoring system that tracks event-level latency — not per-command, but per internal event type (fork, AOF flush, RDB save, etc.).
CONFIG SET latency-monitor-threshold 100 → track events with latency > 100ms
LATENCY LATEST → most recent latency sample per event
LATENCY HISTORY event-name → historical latency for an event
LATENCY RESET [event-name] → clear latency history
127.0.0.1:6379> LATENCY LATEST
1) 1) "aof-stat"
2) (integer) 1717000000 → timestamp
3) (integer) 120 → latency in ms
4) (integer) 350 → max latency seen
Event names to watch:
fork— BGSAVE/BGREWRITEAOF fork latency (high = large dataset or memory pressure)aof-stat— AOF write latency (high = disk I/O bottleneck)rdb-*— RDB save eventscommand— command execution latency (aggregate)
MONITOR: Live Command Stream
MONITOR
MONITOR streams every command executed by every client in real time. It is invaluable for debugging unexpected behaviour ("what is sending KEYS * in production?") but adds 50%+ CPU overhead. Never leave MONITOR running in production.
127.0.0.1:6379> MONITOR
OK
1717000000.123456 [0 10.0.1.100:52394] "GET" "user:1001"
1717000000.124123 [0 10.0.1.101:52395] "HSET" "session:abc123" "lastSeen" "1717000000"
1717000000.124200 [0 10.0.1.100:52394] "SET" "cache:product:999" "..." "EX" "300"
Format: {unix_timestamp} [{db} {client_ip:port}] {command} {args...}
Use it briefly to identify which clients are issuing which commands, then disconnect immediately.
CLIENT LIST and CLIENT INFO
CLIENT LIST → one line per connected client
CLIENT INFO → info for the current client
id=42 addr=10.0.1.100:52394 laddr=10.0.0.10:6379 fd=23 name=myapp age=1234
cmd=get flags=N db=0 sub=0 psub=0 multi=-1 watch=0
qbuf=0 qbuf-free=32768 argv-mem=10 multi-mem=0
tot-mem=20512 rbs=16384 rbp=0 obl=0 oll=0 omem=0
events=r resp=2 uid=0 user=default library-name=ioredis library-ver=5.3.3
Key fields:
cmd— last command issued by this clientage— seconds since connection was establishedsub— number of channels subscribedomem— output buffer memory (large = slow client)flags—b= blocked (BLPOP),S= subscriber
Identify stuck clients: CLIENT LIST + filter for cmd=blpop with high age values.
The 10 Metrics Every Redis Dashboard Must Include
| # | Metric | Source | Alert Threshold |
|---|---|---|---|
| 1 | Cache hit rate | keyspace_hits / (hits + misses) | < 90% |
| 2 | Evicted keys/sec | evicted_keys delta | > 0 |
| 3 | Memory fragmentation ratio | mem_fragmentation_ratio | > 1.5 or < 1.0 |
| 4 | Memory used / maxmemory | used_memory / maxmemory | > 80% |
| 5 | Connected clients | connected_clients | > 80% of maxclients |
| 6 | Ops per second | instantaneous_ops_per_sec | Baseline ± 3σ |
| 7 | Replication lag | slave.lag (INFO replication) | > 5 seconds |
| 8 | Slow commands | SLOWLOG LEN delta | Any increase |
| 9 | Last BGSAVE status | rdb_last_bgsave_status | err |
| 10 | Rejected connections | rejected_connections delta | > 0 |
Export these metrics from INFO every 15–60 seconds to your monitoring system (Prometheus via redis_exporter, Datadog, CloudWatch, etc.).
redis-cli Monitoring Shortcuts
bash# Live stats (refreshes every second) redis-cli --stat # Live latency monitoring redis-cli --latency redis-cli --latency-history -i 5 # sample every 5 seconds # Live memory usage redis-cli --memkeys # memory usage per key pattern (sampling) # Count keys matching a pattern redis-cli --scan --pattern "session:*" | wc -l # Big keys scan (find top memory consumers) redis-cli --bigkeys
redis-cli --bigkeys scans the entire keyspace using SCAN and samples key sizes — it reports the largest key per type. Safe to run on production (uses cursor-based scan, not blocking KEYS *).
Summary
INFOis the starting point — useINFO statsfor throughput and hit rate,INFO memoryfor memory health,INFO replicationfor lag,INFO keyspacefor key distribution- Cache hit rate (
keyspace_hits / total) should be > 90% — below this, investigate TTLs, eviction, and cache warming evicted_keys > 0means memory pressure — increasemaxmemoryor reduce datasetSLOWLOG GETreveals expensive commands — the most common findings:KEYS *,HGETALLon large hashes,SORTLATENCY LATEST/LATENCY HISTORYtracks internal event latency (fork, AOF flush, RDB save)MONITORstreams live commands — invaluable for debugging, catastrophic if left running in productionCLIENT LISTidentifies slow/stuck clients by output buffer size and command age- Export
INFOmetrics every 15–60 seconds to your monitoring system; build dashboards around the 10 core metrics
Next: P-12 — Security: ACLs, TLS, and Network Hardening — per-user command restrictions, TLS for in-transit encryption, bind address configuration, and the most common Redis security misconfigurations.