ELU as a first-class metric, clinic.js toolchain, V8 CPU profiles, core dump analysis, and distributed tracing across Kafka-connected services.
Module 12 — Production Observability, Performance Profiling & Flame Graphs
What this module covers: A blockchain indexer at 50,000 events/second produces hundreds of thousands of function calls per second. When performance degrades, the cause is buried in microseconds — one function accounting for 40% of CPU, one allocation pattern triggering GC every 200ms, one async chain adding 15ms of hidden latency per request. Structured logs, Prometheus metrics, and distributed traces tell you that something is wrong. V8 CPU profiles, flame graphs, and the clinic.js toolchain tell you exactly where and why. This module covers the complete production diagnostic stack for Node.js systems under high-throughput stress.
The Three Pillars: Logs, Metrics, Traces
Every production Node.js service needs all three. They answer different questions:
- Logs: what happened? (event-level detail)
- Metrics: how is the system behaving over time? (aggregate measurements)
- Traces: how does a single request flow through the system? (distributed causality)
Pino: The Fastest Structured Logger
javascriptimport pino from 'pino'; const logger = pino({ level: process.env.LOG_LEVEL ?? 'info', // Serialize in JSON for log aggregators (Datadog, Splunk, CloudWatch) // pino is 5x faster than Winston, 8x faster than console.log transport: process.env.NODE_ENV === 'development' ? { target: 'pino-pretty' } // human-readable in dev : undefined, // raw JSON in production }); // Structured logging — queryable fields, not string concatenation logger.info({ event: 'transaction_ingested', transactionHash: tx.hash.toString('hex'), blockHeight: tx.blockHeight, sender: tx.sender, amount: tx.amount.toString(), processingMs: Date.now() - startTime, }, 'Transaction ingested successfully'); // Child loggers for request-scoped context export function createRequestLogger(requestId: string) { return logger.child({ requestId }); // all logs include requestId }
Why Pino over Winston at high throughput: Pino defers JSON serialization to a separate worker thread. At 50K events/second where each event emits 2–3 log entries: 100K log statements/sec. Winston's synchronous JSON serialization adds ~2ms per 100K logs. Pino's worker thread adds ~0.1ms. At scale, logging becomes a measurable CPU cost.
Prometheus Metrics with prom-client
javascriptimport { Counter, Histogram, Gauge, register } from 'prom-client'; import { collectDefaultMetrics } from 'prom-client'; // Collect Node.js runtime metrics (GC, heap, event loop) collectDefaultMetrics({ prefix: 'nodejs_' }); // Custom application metrics const transactionCounter = new Counter({ name: 'transactions_ingested_total', help: 'Total transactions ingested', labelNames: ['status', 'network'], }); const processingDuration = new Histogram({ name: 'transaction_processing_duration_seconds', help: 'Time to process a single transaction', buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1], labelNames: ['network'], }); const dbPoolUtilization = new Gauge({ name: 'db_pool_utilization_ratio', help: 'Database connection pool utilization (0-1)', }); // Instrument the hot path async function processTransaction(tx) { const end = processingDuration.startTimer({ network: tx.network }); try { await writeToDatabase(tx); transactionCounter.inc({ status: 'success', network: tx.network }); } catch (err) { transactionCounter.inc({ status: 'error', network: tx.network }); throw err; } finally { end(); // records duration in histogram } } // Expose metrics endpoint for Prometheus scraping fastify.get('/metrics', async (req, reply) => { reply.header('Content-Type', register.contentType); return register.metrics(); });
Key metrics for a blockchain indexer:
| Metric | Type | Alert Condition |
|---|---|---|
nodejs_event_loop_utilization | Gauge | > 0.85 |
nodejs_gc_duration_seconds (P99) | Histogram | > 50ms |
transactions_ingested_total rate | Counter | Drop > 20% |
transaction_processing_duration_seconds P99 | Histogram | > 100ms |
db_pool_utilization_ratio | Gauge | > 0.9 |
nodejs_heap_used_bytes | Gauge | > 80% of max |
OpenTelemetry: Distributed Traces
javascriptimport { NodeSDK } from '@opentelemetry/sdk-node'; import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc'; import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'; const sdk = new NodeSDK({ traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4317', }), instrumentations: [getNodeAutoInstrumentations()], // auto-instruments http, pg, kafka }); sdk.start(); // Manual span for custom instrumentation import { trace } from '@opentelemetry/api'; const tracer = trace.getTracer('blockchain-indexer'); async function processBlock(block) { const span = tracer.startSpan('process_block', { attributes: { 'block.height': block.height, 'block.txCount': block.transactions.length } }); try { for (const tx of block.transactions) { const txSpan = tracer.startSpan('process_transaction', { parent: span }); await processTransaction(tx); txSpan.end(); } span.setStatus({ code: SpanStatusCode.OK }); } catch (err) { span.recordException(err); span.setStatus({ code: SpanStatusCode.ERROR }); throw err; } finally { span.end(); } }
With distributed tracing, a single block processing request shows its full latency breakdown across: HTTP receive → parse → signature verification (worker thread) → database write → Kafka publish. You can see exactly which step is slow without guessing.
Event Loop Utilization: The First Metric to Check
ELU was introduced in Module 2. In production, it is the first metric to check during a latency incident.
javascriptimport { eventLoopUtilization } from 'node:perf_hooks'; import { Gauge } from 'prom-client'; const eluGauge = new Gauge({ name: 'nodejs_event_loop_utilization', help: 'Event loop utilization ratio (0-1)', }); let previousELU = eventLoopUtilization(); setInterval(() => { const elu = eventLoopUtilization(previousELU); eluGauge.set(elu.utilization); previousELU = eventLoopUtilization(); if (elu.utilization > 0.90) { logger.warn({ elu: elu.utilization }, 'Event loop saturated — investigate CPU hotspot'); } }, 1_000);
Generating V8 CPU Profiles
A CPU profile samples the call stack every N microseconds. The result: a list of functions and how many samples they appear in. Functions that appear in many samples are consuming CPU time.
Method 1: --cpu-prof Flag
bash# Generate a CPU profile for 30 seconds of load node --cpu-prof --cpu-prof-interval=100 your-indexer.js & PID=$! # Apply load autocannon -c 100 -d 30 http://localhost:3000/api/v2/payments # Send SIGINT to stop profiling kill -SIGINT $PID # Profile file: isolate-XXXXX-XXXXX-v8.cpuprofile # Open in Chrome DevTools: chrome://inspect → Open dedicated DevTools
Method 2: V8 Profiler API
javascriptimport v8Profiler from 'v8-profiler-next'; // Profile a specific workload async function profileCriticalPath() { v8Profiler.startProfiling('ingestion-hot-path', true); // Run the workload you want to profile for (let i = 0; i < 100_000; i++) { await processTransaction(generateTestTransaction()); } const profile = v8Profiler.stopProfiling('ingestion-hot-path'); profile.export((err, result) => { require('fs').writeFileSync('profile.cpuprofile', result); profile.delete(); }); }
Reading Flame Graphs
A flame graph visualizes CPU time as a stack of colored bars:
- X axis: total CPU time (wider = more CPU)
- Y axis: call stack depth (higher = deeper in the call chain)
- Color: random, for visual distinction
- Width of a bar: proportion of CPU time spent in that function and its callees
The anatomy of a flame graph:
│ ←────────────────────────────────────────────────────────────→ 100% CPU time
│ ┌───────────────────────────────────────────────────────────┐
│ │ processBlock (top of stack) │
│ ├────────────────────────────────┬──────────────────────────┤
│ │ parseTransactions (60%) │ validateSignatures (40%) │
│ ├───────────────┬────────────────┤ ┌────────────────────┐ │
│ │ JSON.parse │ normalizeField │ │ crypto.verify │ │
│ │ (35%) │ (25%) │ │ (40%) │ │
│ └───────────────┴────────────────┘ └────────────────────┘ │
Reading rules:
- Wide plateaus at the top are hot spots — functions that consume a large fraction of CPU
- Narrow spikes are expected call depths — not performance problems
- Look for width at the TOP of the stack — if
JSON.parseis 35% wide at the top level, it's consuming 35% of total CPU
Actionable patterns:
PROBLEM: Deep stack with wide base in JSON.parse
└── processBlock
└── parsePayload
└── JSON.parse ←── 40% of CPU
FIX: Move to worker thread or use streaming JSON parser
PROBLEM: GC taking significant width
└── (GC) Scavenge ←── 15% of CPU
FIX: Reduce allocation rate, increase --max-semi-space-size
PROBLEM: Crypto operation dominating
└── processTransaction
└── verifySignature
└── crypto.createVerify ←── 35% of CPU
FIX: Already in thread pool? If not, move to worker_threads
The clinic.js Diagnostic Toolchain
clinic.js is the most comprehensive Node.js diagnostic suite. Three tools, each answering a different question.
clinic doctor: What Is Wrong?
bashnpm install -g clinic # Run your server under clinic doctor clinic doctor -- node your-indexer.js & SERVER_PID=$! # Apply representative load autocannon -c 100 -d 30 http://localhost:3000/api/v2/payments # Stop and analyze kill $SERVER_PID # Opens interactive HTML report in browser
clinic doctor analyzes four signals:
- Event loop delay — is the event loop lagging? (indicates blocking code)
- CPU usage — is the process CPU-bound?
- Memory — is memory growing continuously? (indicates leak)
- Handles/requests — are there open handles preventing process exit? (indicates resource leak)
What the report tells you:
Issue detected: Event loop delay is high (avg 45ms)
Possible cause: Heavy synchronous computation
Recommendation: Use worker_threads for CPU-intensive operations
Or consider async alternatives
Issue detected: Heap usage growing without GC collection
Possible cause: Memory leak (event listeners, closures retaining references)
Recommendation: Take heap snapshot and analyze retained object graph
clinic flame: Where Is the CPU Going?
bashclinic flame -- node your-indexer.js & # ... apply load and stop ... # Opens interactive flamegraph
clinic flame generates a proper flamegraph from --perf profiling data, with:
- Merged V8 and native frames (you see both JS and C++ in one view)
- Click-to-zoom for deep inspection
- Filtering by function name
Interpreting clinic flame output for a transaction parser:
Before optimization (clinic flame shows):
processBlock (100%)
└── parseTransaction (72%)
└── JSON.parse (72%) ←── 72% of CPU in JSON.parse!
└── writeToDatabase (28%)
After moving JSON.parse to worker_threads:
processBlock (100%)
└── parseTransaction (12%) ←── down from 72%!
└── Buffer.toString (12%) ←── just converting to string for worker
└── writeToDatabase (48%)
└── worker coordination (40%)
clinic bubbleprof: Where Is the Async Time Going?
bashclinic bubbleprof -- node your-indexer.js &
bubbleprof visualizes async operations as bubbles — the size of each bubble represents how long async operations took. It shows where time is spent waiting rather than executing.
Useful for diagnosing:
- Database queries that are slower than expected
- HTTP client calls with unexpected latency
- Async chains that add unnecessary await depth
Heap Snapshots: Finding Memory Leaks
When clinic doctor shows continuously growing memory, take a heap snapshot:
javascript// Trigger heap snapshot on demand (safe in production for 1–2 seconds) import v8 from 'v8'; import fs from 'fs'; process.on('SIGUSR2', () => { const timestamp = new Date().toISOString().replace(/:/g, '-'); const filename = `heap-${timestamp}.heapsnapshot`; const stream = v8.writeHeapSnapshot(filename); logger.info({ filename }, 'Heap snapshot written'); }); // Trigger: kill -SIGUSR2 $(pgrep -f your-indexer)
bash# Take two snapshots 5 minutes apart kill -SIGUSR2 $PID # snapshot 1 sleep 300 kill -SIGUSR2 $PID # snapshot 2 # Open Chrome DevTools → Memory tab → Load both snapshots # Use "Comparison" view to see what objects grew between snapshots
Common memory leak patterns in Node.js:
javascript// LEAK 1: EventEmitter listener accumulation class TransactionProcessor extends EventEmitter { processBlock(block) { block.transactions.forEach(tx => { this.on('validate', () => tx.validate()); // ← listener added per transaction // Never removed! Listener count grows with each transaction }); } } // FIX: use this.once() or explicitly removeListener() // LEAK 2: Closure retaining large objects function createHandler(largeBuffer) { return function handler(req) { // largeBuffer is in scope → never GC'd while handler exists return process(req, largeBuffer); }; } // FIX: pass largeBuffer as parameter, don't close over it // LEAK 3: Map/Set growing without bound const requestCache = new Map(); app.get('/api/payment/:id', (req, res) => { requestCache.set(req.params.id, req); // ← never evicted! // ... }); // FIX: use LRU cache with bounded size, or TTL
auto_explain for Node.js: async_hooks + Performance Timing
The equivalent of PostgreSQL's auto_explain for Node.js: automatically log slow async operations above a threshold.
javascriptimport { createHook, executionAsyncId } from 'node:async_hooks'; import { performance } from 'node:perf_hooks'; const asyncStartTimes = new Map(); const SLOW_THRESHOLD_MS = 100; const hook = createHook({ init(asyncId, type, triggerAsyncId) { asyncStartTimes.set(asyncId, { type, start: performance.now() }); }, destroy(asyncId) { const entry = asyncStartTimes.get(asyncId); if (!entry) return; const duration = performance.now() - entry.start; asyncStartTimes.delete(asyncId); if (duration > SLOW_THRESHOLD_MS) { logger.warn({ asyncType: entry.type, durationMs: duration.toFixed(2) }, 'Slow async operation detected'); } } }); hook.enable(); // Produces logs like: // {"asyncType":"TCPWRAP","durationMs":"234.12"} — slow TCP socket // {"asyncType":"FSREQCALLBACK","durationMs":"145.88"} — slow file operation // {"asyncType":"Timeout","durationMs":"312.44"} — setTimeout fired late (event loop lag)
The Production Incident: Undetected Memory Leak via Event Listener
Context: A blockchain indexer WebSocket subscription service. Engineers received an alert: process memory at 94% of limit after 6 hours of operation. Restart fixed it temporarily.
Diagnosis with heap snapshot comparison:
Snapshot 1 (at startup): 180MB heap Snapshot 2 (6 hours later): 1.8GB heap
Chrome DevTools comparison view showed:
Object type Count (snapshot 1) Count (snapshot 2) Delta
EventEmitter 42 183,492 +183,450
Function 8,240 1,240,830 +1,232,590
183,000 EventEmitter instances — one per WebSocket subscriber (12,000 subscribers × ~15 listeners each = 180,000 accumulated listener objects that were never cleaned up).
The broken code:
javascript// WebSocket subscription handler wsServer.on('connection', (ws) => { // Register listener on shared blockEventBus for this subscriber blockEventBus.on('new-block', (block) => { ws.send(JSON.stringify(block)); // ← closure captures 'ws' }); ws.on('close', () => { // ← listener NEVER REMOVED from blockEventBus // ws is closed but the listener still exists // blockEventBus still holds reference to the listener // which holds reference to 'ws' // Memory leak: closed WebSocket + listener permanently retained }); });
The fix:
javascriptwsServer.on('connection', (ws) => { const handler = (block) => { if (ws.readyState === WebSocket.OPEN) { ws.send(JSON.stringify(block)); } }; blockEventBus.on('new-block', handler); ws.on('close', () => { blockEventBus.off('new-block', handler); // ← explicitly remove listener // handler (and its closure over 'ws') can now be GC'd }); });
After the fix: memory stabilized at 220MB regardless of subscriber count or runtime duration.
Summary
| Concept | Key Takeaway |
|---|---|
| Pino | 5x faster than Winston. Structured JSON logging. Use child loggers for request context. |
| Prometheus + prom-client | Counter/Histogram/Gauge. Export /metrics. Alert on ELU, GC duration, pool utilization. |
| OpenTelemetry | Auto-instruments HTTP, pg, Kafka. Use startSpan for custom instrumentation. |
| ELU gauge | First metric to check during latency incident. > 0.85 = investigate CPU hotspot. |
--cpu-prof | Generate V8 CPU profile. Open in Chrome DevTools. Identify hot functions. |
| Flame graph | X axis = CPU time. Wide plateau at top = hot spot. Narrow spike = expected depth. |
clinic doctor | Diagnoses event loop delay, memory leaks, CPU saturation. First tool in the diagnostic workflow. |
clinic flame | CPU flamegraph with V8 + native frames. Identifies exact function consuming CPU. |
clinic bubbleprof | Async operation visualization. Shows where time is waiting vs executing. |
| Heap snapshots | Two snapshots + comparison view = pinpoint leaked object type. kill -SIGUSR2 to trigger. |
| EventEmitter leak | Always off() listeners when the subscriber closes. Most common Node.js memory leak. |
You can now diagnose performance problems with precision. Module 13 covers the connection layer — the pool configurations and process management patterns that determine how many requests your service can actually handle concurrently.
Next: Module 13 — Advanced Connection Pooling & Process Management →