Module A-13·25 min read

ELU as a first-class metric, clinic.js toolchain, V8 CPU profiles, core dump analysis, and distributed tracing across Kafka-connected services.

Module 12 — Production Observability, Performance Profiling & Flame Graphs

What this module covers: A blockchain indexer at 50,000 events/second produces hundreds of thousands of function calls per second. When performance degrades, the cause is buried in microseconds — one function accounting for 40% of CPU, one allocation pattern triggering GC every 200ms, one async chain adding 15ms of hidden latency per request. Structured logs, Prometheus metrics, and distributed traces tell you that something is wrong. V8 CPU profiles, flame graphs, and the clinic.js toolchain tell you exactly where and why. This module covers the complete production diagnostic stack for Node.js systems under high-throughput stress.


The Three Pillars: Logs, Metrics, Traces

Every production Node.js service needs all three. They answer different questions:

  • Logs: what happened? (event-level detail)
  • Metrics: how is the system behaving over time? (aggregate measurements)
  • Traces: how does a single request flow through the system? (distributed causality)

Pino: The Fastest Structured Logger

javascript
import pino from 'pino'; const logger = pino({ level: process.env.LOG_LEVEL ?? 'info', // Serialize in JSON for log aggregators (Datadog, Splunk, CloudWatch) // pino is 5x faster than Winston, 8x faster than console.log transport: process.env.NODE_ENV === 'development' ? { target: 'pino-pretty' } // human-readable in dev : undefined, // raw JSON in production }); // Structured logging — queryable fields, not string concatenation logger.info({ event: 'transaction_ingested', transactionHash: tx.hash.toString('hex'), blockHeight: tx.blockHeight, sender: tx.sender, amount: tx.amount.toString(), processingMs: Date.now() - startTime, }, 'Transaction ingested successfully'); // Child loggers for request-scoped context export function createRequestLogger(requestId: string) { return logger.child({ requestId }); // all logs include requestId }

Why Pino over Winston at high throughput: Pino defers JSON serialization to a separate worker thread. At 50K events/second where each event emits 2–3 log entries: 100K log statements/sec. Winston's synchronous JSON serialization adds ~2ms per 100K logs. Pino's worker thread adds ~0.1ms. At scale, logging becomes a measurable CPU cost.

Prometheus Metrics with prom-client

javascript
import { Counter, Histogram, Gauge, register } from 'prom-client'; import { collectDefaultMetrics } from 'prom-client'; // Collect Node.js runtime metrics (GC, heap, event loop) collectDefaultMetrics({ prefix: 'nodejs_' }); // Custom application metrics const transactionCounter = new Counter({ name: 'transactions_ingested_total', help: 'Total transactions ingested', labelNames: ['status', 'network'], }); const processingDuration = new Histogram({ name: 'transaction_processing_duration_seconds', help: 'Time to process a single transaction', buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1], labelNames: ['network'], }); const dbPoolUtilization = new Gauge({ name: 'db_pool_utilization_ratio', help: 'Database connection pool utilization (0-1)', }); // Instrument the hot path async function processTransaction(tx) { const end = processingDuration.startTimer({ network: tx.network }); try { await writeToDatabase(tx); transactionCounter.inc({ status: 'success', network: tx.network }); } catch (err) { transactionCounter.inc({ status: 'error', network: tx.network }); throw err; } finally { end(); // records duration in histogram } } // Expose metrics endpoint for Prometheus scraping fastify.get('/metrics', async (req, reply) => { reply.header('Content-Type', register.contentType); return register.metrics(); });

Key metrics for a blockchain indexer:

MetricTypeAlert Condition
nodejs_event_loop_utilizationGauge> 0.85
nodejs_gc_duration_seconds (P99)Histogram> 50ms
transactions_ingested_total rateCounterDrop > 20%
transaction_processing_duration_seconds P99Histogram> 100ms
db_pool_utilization_ratioGauge> 0.9
nodejs_heap_used_bytesGauge> 80% of max

OpenTelemetry: Distributed Traces

javascript
import { NodeSDK } from '@opentelemetry/sdk-node'; import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc'; import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'; const sdk = new NodeSDK({ traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4317', }), instrumentations: [getNodeAutoInstrumentations()], // auto-instruments http, pg, kafka }); sdk.start(); // Manual span for custom instrumentation import { trace } from '@opentelemetry/api'; const tracer = trace.getTracer('blockchain-indexer'); async function processBlock(block) { const span = tracer.startSpan('process_block', { attributes: { 'block.height': block.height, 'block.txCount': block.transactions.length } }); try { for (const tx of block.transactions) { const txSpan = tracer.startSpan('process_transaction', { parent: span }); await processTransaction(tx); txSpan.end(); } span.setStatus({ code: SpanStatusCode.OK }); } catch (err) { span.recordException(err); span.setStatus({ code: SpanStatusCode.ERROR }); throw err; } finally { span.end(); } }

With distributed tracing, a single block processing request shows its full latency breakdown across: HTTP receive → parse → signature verification (worker thread) → database write → Kafka publish. You can see exactly which step is slow without guessing.


Event Loop Utilization: The First Metric to Check

ELU was introduced in Module 2. In production, it is the first metric to check during a latency incident.

javascript
import { eventLoopUtilization } from 'node:perf_hooks'; import { Gauge } from 'prom-client'; const eluGauge = new Gauge({ name: 'nodejs_event_loop_utilization', help: 'Event loop utilization ratio (0-1)', }); let previousELU = eventLoopUtilization(); setInterval(() => { const elu = eventLoopUtilization(previousELU); eluGauge.set(elu.utilization); previousELU = eventLoopUtilization(); if (elu.utilization > 0.90) { logger.warn({ elu: elu.utilization }, 'Event loop saturated — investigate CPU hotspot'); } }, 1_000);

Generating V8 CPU Profiles

A CPU profile samples the call stack every N microseconds. The result: a list of functions and how many samples they appear in. Functions that appear in many samples are consuming CPU time.

Method 1: --cpu-prof Flag

bash
# Generate a CPU profile for 30 seconds of load node --cpu-prof --cpu-prof-interval=100 your-indexer.js & PID=$! # Apply load autocannon -c 100 -d 30 http://localhost:3000/api/v2/payments # Send SIGINT to stop profiling kill -SIGINT $PID # Profile file: isolate-XXXXX-XXXXX-v8.cpuprofile # Open in Chrome DevTools: chrome://inspect → Open dedicated DevTools

Method 2: V8 Profiler API

javascript
import v8Profiler from 'v8-profiler-next'; // Profile a specific workload async function profileCriticalPath() { v8Profiler.startProfiling('ingestion-hot-path', true); // Run the workload you want to profile for (let i = 0; i < 100_000; i++) { await processTransaction(generateTestTransaction()); } const profile = v8Profiler.stopProfiling('ingestion-hot-path'); profile.export((err, result) => { require('fs').writeFileSync('profile.cpuprofile', result); profile.delete(); }); }

Reading Flame Graphs

A flame graph visualizes CPU time as a stack of colored bars:

  • X axis: total CPU time (wider = more CPU)
  • Y axis: call stack depth (higher = deeper in the call chain)
  • Color: random, for visual distinction
  • Width of a bar: proportion of CPU time spent in that function and its callees
The anatomy of a flame graph:

│  ←────────────────────────────────────────────────────────────→ 100% CPU time
│  ┌───────────────────────────────────────────────────────────┐
│  │                     processBlock (top of stack)            │
│  ├────────────────────────────────┬──────────────────────────┤
│  │    parseTransactions (60%)     │  validateSignatures (40%) │
│  ├───────────────┬────────────────┤    ┌────────────────────┐ │
│  │ JSON.parse    │ normalizeField │    │  crypto.verify     │ │
│  │ (35%)         │ (25%)          │    │  (40%)             │ │
│  └───────────────┴────────────────┘    └────────────────────┘ │

Reading rules:

  1. Wide plateaus at the top are hot spots — functions that consume a large fraction of CPU
  2. Narrow spikes are expected call depths — not performance problems
  3. Look for width at the TOP of the stack — if JSON.parse is 35% wide at the top level, it's consuming 35% of total CPU

Actionable patterns:

PROBLEM: Deep stack with wide base in JSON.parse
  └── processBlock
      └── parsePayload
          └── JSON.parse  ←── 40% of CPU
  FIX: Move to worker thread or use streaming JSON parser

PROBLEM: GC taking significant width
  └── (GC) Scavenge  ←── 15% of CPU
  FIX: Reduce allocation rate, increase --max-semi-space-size

PROBLEM: Crypto operation dominating
  └── processTransaction
      └── verifySignature
          └── crypto.createVerify  ←── 35% of CPU
  FIX: Already in thread pool? If not, move to worker_threads

The clinic.js Diagnostic Toolchain

clinic.js is the most comprehensive Node.js diagnostic suite. Three tools, each answering a different question.

clinic doctor: What Is Wrong?

bash
npm install -g clinic # Run your server under clinic doctor clinic doctor -- node your-indexer.js & SERVER_PID=$! # Apply representative load autocannon -c 100 -d 30 http://localhost:3000/api/v2/payments # Stop and analyze kill $SERVER_PID # Opens interactive HTML report in browser

clinic doctor analyzes four signals:

  1. Event loop delay — is the event loop lagging? (indicates blocking code)
  2. CPU usage — is the process CPU-bound?
  3. Memory — is memory growing continuously? (indicates leak)
  4. Handles/requests — are there open handles preventing process exit? (indicates resource leak)

What the report tells you:

Issue detected: Event loop delay is high (avg 45ms)
Possible cause: Heavy synchronous computation
Recommendation: Use worker_threads for CPU-intensive operations
                Or consider async alternatives

Issue detected: Heap usage growing without GC collection
Possible cause: Memory leak (event listeners, closures retaining references)
Recommendation: Take heap snapshot and analyze retained object graph

clinic flame: Where Is the CPU Going?

bash
clinic flame -- node your-indexer.js & # ... apply load and stop ... # Opens interactive flamegraph

clinic flame generates a proper flamegraph from --perf profiling data, with:

  • Merged V8 and native frames (you see both JS and C++ in one view)
  • Click-to-zoom for deep inspection
  • Filtering by function name

Interpreting clinic flame output for a transaction parser:

Before optimization (clinic flame shows):
  processBlock (100%)
  └── parseTransaction (72%)
      └── JSON.parse (72%)  ←── 72% of CPU in JSON.parse!
  └── writeToDatabase (28%)

After moving JSON.parse to worker_threads:
  processBlock (100%)
  └── parseTransaction (12%)  ←── down from 72%!
      └── Buffer.toString (12%)  ←── just converting to string for worker
  └── writeToDatabase (48%)
  └── worker coordination (40%)

clinic bubbleprof: Where Is the Async Time Going?

bash
clinic bubbleprof -- node your-indexer.js &

bubbleprof visualizes async operations as bubbles — the size of each bubble represents how long async operations took. It shows where time is spent waiting rather than executing.

Useful for diagnosing:

  • Database queries that are slower than expected
  • HTTP client calls with unexpected latency
  • Async chains that add unnecessary await depth

Heap Snapshots: Finding Memory Leaks

When clinic doctor shows continuously growing memory, take a heap snapshot:

javascript
// Trigger heap snapshot on demand (safe in production for 1–2 seconds) import v8 from 'v8'; import fs from 'fs'; process.on('SIGUSR2', () => { const timestamp = new Date().toISOString().replace(/:/g, '-'); const filename = `heap-${timestamp}.heapsnapshot`; const stream = v8.writeHeapSnapshot(filename); logger.info({ filename }, 'Heap snapshot written'); }); // Trigger: kill -SIGUSR2 $(pgrep -f your-indexer)
bash
# Take two snapshots 5 minutes apart kill -SIGUSR2 $PID # snapshot 1 sleep 300 kill -SIGUSR2 $PID # snapshot 2 # Open Chrome DevTools → Memory tab → Load both snapshots # Use "Comparison" view to see what objects grew between snapshots

Common memory leak patterns in Node.js:

javascript
// LEAK 1: EventEmitter listener accumulation class TransactionProcessor extends EventEmitter { processBlock(block) { block.transactions.forEach(tx => { this.on('validate', () => tx.validate()); // ← listener added per transaction // Never removed! Listener count grows with each transaction }); } } // FIX: use this.once() or explicitly removeListener() // LEAK 2: Closure retaining large objects function createHandler(largeBuffer) { return function handler(req) { // largeBuffer is in scope → never GC'd while handler exists return process(req, largeBuffer); }; } // FIX: pass largeBuffer as parameter, don't close over it // LEAK 3: Map/Set growing without bound const requestCache = new Map(); app.get('/api/payment/:id', (req, res) => { requestCache.set(req.params.id, req); // ← never evicted! // ... }); // FIX: use LRU cache with bounded size, or TTL

auto_explain for Node.js: async_hooks + Performance Timing

The equivalent of PostgreSQL's auto_explain for Node.js: automatically log slow async operations above a threshold.

javascript
import { createHook, executionAsyncId } from 'node:async_hooks'; import { performance } from 'node:perf_hooks'; const asyncStartTimes = new Map(); const SLOW_THRESHOLD_MS = 100; const hook = createHook({ init(asyncId, type, triggerAsyncId) { asyncStartTimes.set(asyncId, { type, start: performance.now() }); }, destroy(asyncId) { const entry = asyncStartTimes.get(asyncId); if (!entry) return; const duration = performance.now() - entry.start; asyncStartTimes.delete(asyncId); if (duration > SLOW_THRESHOLD_MS) { logger.warn({ asyncType: entry.type, durationMs: duration.toFixed(2) }, 'Slow async operation detected'); } } }); hook.enable(); // Produces logs like: // {"asyncType":"TCPWRAP","durationMs":"234.12"} — slow TCP socket // {"asyncType":"FSREQCALLBACK","durationMs":"145.88"} — slow file operation // {"asyncType":"Timeout","durationMs":"312.44"} — setTimeout fired late (event loop lag)

The Production Incident: Undetected Memory Leak via Event Listener

Context: A blockchain indexer WebSocket subscription service. Engineers received an alert: process memory at 94% of limit after 6 hours of operation. Restart fixed it temporarily.

Diagnosis with heap snapshot comparison:

Snapshot 1 (at startup): 180MB heap Snapshot 2 (6 hours later): 1.8GB heap

Chrome DevTools comparison view showed:

Object type          Count (snapshot 1)  Count (snapshot 2)  Delta
EventEmitter          42                  183,492             +183,450
Function              8,240               1,240,830           +1,232,590

183,000 EventEmitter instances — one per WebSocket subscriber (12,000 subscribers × ~15 listeners each = 180,000 accumulated listener objects that were never cleaned up).

The broken code:

javascript
// WebSocket subscription handler wsServer.on('connection', (ws) => { // Register listener on shared blockEventBus for this subscriber blockEventBus.on('new-block', (block) => { ws.send(JSON.stringify(block)); // ← closure captures 'ws' }); ws.on('close', () => { // ← listener NEVER REMOVED from blockEventBus // ws is closed but the listener still exists // blockEventBus still holds reference to the listener // which holds reference to 'ws' // Memory leak: closed WebSocket + listener permanently retained }); });

The fix:

javascript
wsServer.on('connection', (ws) => { const handler = (block) => { if (ws.readyState === WebSocket.OPEN) { ws.send(JSON.stringify(block)); } }; blockEventBus.on('new-block', handler); ws.on('close', () => { blockEventBus.off('new-block', handler); // ← explicitly remove listener // handler (and its closure over 'ws') can now be GC'd }); });

After the fix: memory stabilized at 220MB regardless of subscriber count or runtime duration.


Summary

ConceptKey Takeaway
Pino5x faster than Winston. Structured JSON logging. Use child loggers for request context.
Prometheus + prom-clientCounter/Histogram/Gauge. Export /metrics. Alert on ELU, GC duration, pool utilization.
OpenTelemetryAuto-instruments HTTP, pg, Kafka. Use startSpan for custom instrumentation.
ELU gaugeFirst metric to check during latency incident. > 0.85 = investigate CPU hotspot.
--cpu-profGenerate V8 CPU profile. Open in Chrome DevTools. Identify hot functions.
Flame graphX axis = CPU time. Wide plateau at top = hot spot. Narrow spike = expected depth.
clinic doctorDiagnoses event loop delay, memory leaks, CPU saturation. First tool in the diagnostic workflow.
clinic flameCPU flamegraph with V8 + native frames. Identifies exact function consuming CPU.
clinic bubbleprofAsync operation visualization. Shows where time is waiting vs executing.
Heap snapshotsTwo snapshots + comparison view = pinpoint leaked object type. kill -SIGUSR2 to trigger.
EventEmitter leakAlways off() listeners when the subscriber closes. Most common Node.js memory leak.

You can now diagnose performance problems with precision. Module 13 covers the connection layer — the pool configurations and process management patterns that determine how many requests your service can actually handle concurrently.

Next: Module 13 — Advanced Connection Pooling & Process Management →

© 2026 Jatin Jain Saraf (JJS). All rights reserved.