Module A-13·25 min read

ELU as a first-class metric, clinic.js toolchain, V8 CPU profiles, core dump analysis, and distributed tracing across Kafka-connected services.

Module 12 — Production Observability, Performance Profiling & Flame Graphs

Q: When analyzing a flame graph generated from a Node.js CPU profile, what does a "wide plateau" at the top of a stack represent?

A function that is consuming a significant portion of the total CPU time. — In a flame graph, the width of a frame corresponds to the percentage of total CPU time spent in that function and its descendants. A wide block sitting directly at the top of the stack is actively executing and dominating CPU usage.

Q: Which diagnostic tool from the `clinic.js` toolchain is best suited for visualizing where an application is spending time *waiting* for asynchronous operations (like database queries or HTTP calls) rather than executing CPU instructions?

clinic bubbleprof — `clinic bubbleprof` diagrams asynchronous flow and latency. It allows you to see the "waiting time" caused by asynchronous boundaries, which is crucial for identifying slow database queries, long API calls, or deep async chains.

Q: What is identified as one of the most common causes of memory leaks in Node.js applications?

Failing to remove event listeners (e.g., calling `off()` or `removeListener()`) when a subscription or connection is closed, keeping closures and objects in memory. — Because Node.js makes heavy use of EventEmitters, developers frequently add listeners but forget to clean them up. The active listener object and any variables trapped in its closure prevent garbage collection, leading to persistent memory growth over time.

What this module covers: A blockchain indexer at 50,000 events/second produces hundreds of thousands of function calls per second. When performance degrades, the cause is buried in microseconds — one function accounting for 40% of CPU, one allocation pattern triggering GC every 200ms, one async chain adding 15ms of hidden latency per request. Structured logs, Prometheus metrics, and distributed traces tell you that something is wrong. V8 CPU profiles, flame graphs, and the clinic.js toolchain tell you exactly where and why. This module covers the complete production diagnostic stack for Node.js systems under high-throughput stress.

The Three Pillars: Logs, Metrics, Traces

Every production Node.js service needs all three. They answer different questions:

Logs: what happened? (event-level detail)
Metrics: how is the system behaving over time? (aggregate measurements)
Traces: how does a single request flow through the system? (distributed causality)

Pino: The Fastest Structured Logger

javascript

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  // Serialize in JSON for log aggregators (Datadog, Splunk, CloudWatch)
  // pino is 5x faster than Winston, 8x faster than console.log
  transport: process.env.NODE_ENV === 'development'
    ? { target: 'pino-pretty' }  // human-readable in dev
    : undefined,                  // raw JSON in production
});

// Structured logging — queryable fields, not string concatenation
logger.info({
  event: 'transaction_ingested',
  transactionHash: tx.hash.toString('hex'),
  blockHeight: tx.blockHeight,
  sender: tx.sender,
  amount: tx.amount.toString(),
  processingMs: Date.now() - startTime,
}, 'Transaction ingested successfully');

// Child loggers for request-scoped context
export function createRequestLogger(requestId: string) {
  return logger.child({ requestId });  // all logs include requestId
}

Why Pino over Winston at high throughput: Pino defers JSON serialization to a separate worker thread. At 50K events/second where each event emits 2–3 log entries: 100K log statements/sec. Winston's synchronous JSON serialization adds ~2ms per 100K logs. Pino's worker thread adds ~0.1ms. At scale, logging becomes a measurable CPU cost.

Prometheus Metrics with `prom-client`

javascript

import { Counter, Histogram, Gauge, register } from 'prom-client';
import { collectDefaultMetrics } from 'prom-client';

// Collect Node.js runtime metrics (GC, heap, event loop)
collectDefaultMetrics({ prefix: 'nodejs_' });

// Custom application metrics
const transactionCounter = new Counter({
  name: 'transactions_ingested_total',
  help: 'Total transactions ingested',
  labelNames: ['status', 'network'],
});

const processingDuration = new Histogram({
  name: 'transaction_processing_duration_seconds',
  help: 'Time to process a single transaction',
  buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1],
  labelNames: ['network'],
});

const dbPoolUtilization = new Gauge({
  name: 'db_pool_utilization_ratio',
  help: 'Database connection pool utilization (0-1)',
});

// Instrument the hot path
async function processTransaction(tx) {
  const end = processingDuration.startTimer({ network: tx.network });
  try {
    await writeToDatabase(tx);
    transactionCounter.inc({ status: 'success', network: tx.network });
  } catch (err) {
    transactionCounter.inc({ status: 'error', network: tx.network });
    throw err;
  } finally {
    end();  // records duration in histogram
  }
}

// Expose metrics endpoint for Prometheus scraping
fastify.get('/metrics', async (req, reply) => {
  reply.header('Content-Type', register.contentType);
  return register.metrics();
});

Key metrics for a blockchain indexer:

Metric	Type	Alert Condition
`nodejs_event_loop_utilization`	Gauge	> 0.85
`nodejs_gc_duration_seconds` (P99)	Histogram	> 50ms
`transactions_ingested_total` rate	Counter	Drop > 20%
`transaction_processing_duration_seconds` P99	Histogram	> 100ms
`db_pool_utilization_ratio`	Gauge	> 0.9
`nodejs_heap_used_bytes`	Gauge	> 80% of max

OpenTelemetry: Distributed Traces

javascript

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://jaeger:4317',
  }),
  instrumentations: [getNodeAutoInstrumentations()],  // auto-instruments http, pg, kafka
});

sdk.start();

// Manual span for custom instrumentation
import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('blockchain-indexer');

async function processBlock(block) {
  const span = tracer.startSpan('process_block', {
    attributes: { 'block.height': block.height, 'block.txCount': block.transactions.length }
  });

  try {
    for (const tx of block.transactions) {
      const txSpan = tracer.startSpan('process_transaction', { parent: span });
      await processTransaction(tx);
      txSpan.end();
    }
    span.setStatus({ code: SpanStatusCode.OK });
  } catch (err) {
    span.recordException(err);
    span.setStatus({ code: SpanStatusCode.ERROR });
    throw err;
  } finally {
    span.end();
  }
}

With distributed tracing, a single block processing request shows its full latency breakdown across: HTTP receive → parse → signature verification (worker thread) → database write → Kafka publish. You can see exactly which step is slow without guessing.

Event Loop Utilization: The First Metric to Check

ELU was introduced in Module 2. In production, it is the first metric to check during a latency incident.

javascript

import { eventLoopUtilization } from 'node:perf_hooks';
import { Gauge } from 'prom-client';

const eluGauge = new Gauge({
  name: 'nodejs_event_loop_utilization',
  help: 'Event loop utilization ratio (0-1)',
});

let previousELU = eventLoopUtilization();

setInterval(() => {
  const elu = eventLoopUtilization(previousELU);
  eluGauge.set(elu.utilization);
  previousELU = eventLoopUtilization();

  if (elu.utilization > 0.90) {
    logger.warn({ elu: elu.utilization }, 'Event loop saturated — investigate CPU hotspot');
  }
}, 1_000);

Generating V8 CPU Profiles

A CPU profile samples the call stack every N microseconds. The result: a list of functions and how many samples they appear in. Functions that appear in many samples are consuming CPU time.

Method 1: `--cpu-prof` Flag

bash

# Generate a CPU profile for 30 seconds of load
node --cpu-prof --cpu-prof-interval=100 your-indexer.js &
PID=$!

# Apply load
autocannon -c 100 -d 30 http://localhost:3000/api/v2/payments

# Send SIGINT to stop profiling
kill -SIGINT $PID

# Profile file: isolate-XXXXX-XXXXX-v8.cpuprofile
# Open in Chrome DevTools: chrome://inspect → Open dedicated DevTools

Method 2: V8 Profiler API

javascript

import v8Profiler from 'v8-profiler-next';

// Profile a specific workload
async function profileCriticalPath() {
  v8Profiler.startProfiling('ingestion-hot-path', true);

  // Run the workload you want to profile
  for (let i = 0; i < 100_000; i++) {
    await processTransaction(generateTestTransaction());
  }

  const profile = v8Profiler.stopProfiling('ingestion-hot-path');

  profile.export((err, result) => {
    require('fs').writeFileSync('profile.cpuprofile', result);
    profile.delete();
  });
}

Reading Flame Graphs

A flame graph visualizes CPU time as a stack of colored bars:

X axis: total CPU time (wider = more CPU)
Y axis: call stack depth (higher = deeper in the call chain)
Color: random, for visual distinction
Width of a bar: proportion of CPU time spent in that function and its callees

text

The anatomy of a flame graph:

│  ←────────────────────────────────────────────────────────────→ 100% CPU time
│  ┌───────────────────────────────────────────────────────────┐
│  │                     processBlock (top of stack)            │
│  ├────────────────────────────────┬──────────────────────────┤
│  │    parseTransactions (60%)     │  validateSignatures (40%) │
│  ├───────────────┬────────────────┤    ┌────────────────────┐ │
│  │ JSON.parse    │ normalizeField │    │  crypto.verify     │ │
│  │ (35%)         │ (25%)          │    │  (40%)             │ │
│  └───────────────┴────────────────┘    └────────────────────┘ │

Reading rules:

Wide plateaus at the top are hot spots — functions that consume a large fraction of CPU
Narrow spikes are expected call depths — not performance problems
Look for width at the TOP of the stack — if JSON.parse is 35% wide at the top level, it's consuming 35% of total CPU

Actionable patterns:

text

PROBLEM: Deep stack with wide base in JSON.parse
  └── processBlock
      └── parsePayload
          └── JSON.parse  ←── 40% of CPU
  FIX: Move to worker thread or use streaming JSON parser

PROBLEM: GC taking significant width
  └── (GC) Scavenge  ←── 15% of CPU
  FIX: Reduce allocation rate, increase --max-semi-space-size

PROBLEM: Crypto operation dominating
  └── processTransaction
      └── verifySignature
          └── crypto.createVerify  ←── 35% of CPU
  FIX: Already in thread pool? If not, move to worker_threads

The `clinic.js` Diagnostic Toolchain

clinic.js is the most comprehensive Node.js diagnostic suite. Three tools, each answering a different question.

`clinic doctor`: What Is Wrong?

bash

npm install -g clinic

# Run your server under clinic doctor
clinic doctor -- node your-indexer.js &
SERVER_PID=$!

# Apply representative load
autocannon -c 100 -d 30 http://localhost:3000/api/v2/payments

# Stop and analyze
kill $SERVER_PID
# Opens interactive HTML report in browser

clinic doctor analyzes four signals:

Event loop delay — is the event loop lagging? (indicates blocking code)
CPU usage — is the process CPU-bound?
Memory — is memory growing continuously? (indicates leak)
Handles/requests — are there open handles preventing process exit? (indicates resource leak)

What the report tells you:

text

Issue detected: Event loop delay is high (avg 45ms)
Possible cause: Heavy synchronous computation
Recommendation: Use worker_threads for CPU-intensive operations
                Or consider async alternatives

Issue detected: Heap usage growing without GC collection
Possible cause: Memory leak (event listeners, closures retaining references)
Recommendation: Take heap snapshot and analyze retained object graph

`clinic flame`: Where Is the CPU Going?

bash

clinic flame -- node your-indexer.js &
# ... apply load and stop ...
# Opens interactive flamegraph

clinic flame generates a proper flamegraph from --perf profiling data, with:

Merged V8 and native frames (you see both JS and C++ in one view)
Click-to-zoom for deep inspection
Filtering by function name

Interpreting clinic flame output for a transaction parser:

text

Before optimization (clinic flame shows):
  processBlock (100%)
  └── parseTransaction (72%)
      └── JSON.parse (72%)  ←── 72% of CPU in JSON.parse!
  └── writeToDatabase (28%)

After moving JSON.parse to worker_threads:
  processBlock (100%)
  └── parseTransaction (12%)  ←── down from 72%!
      └── Buffer.toString (12%)  ←── just converting to string for worker
  └── writeToDatabase (48%)
  └── worker coordination (40%)

`clinic bubbleprof`: Where Is the Async Time Going?

bash

clinic bubbleprof -- node your-indexer.js &

bubbleprof visualizes async operations as bubbles — the size of each bubble represents how long async operations took. It shows where time is spent waiting rather than executing.

Useful for diagnosing:

Database queries that are slower than expected
HTTP client calls with unexpected latency
Async chains that add unnecessary await depth

Heap Snapshots: Finding Memory Leaks

When clinic doctor shows continuously growing memory, take a heap snapshot:

javascript

// Trigger heap snapshot on demand (safe in production for 1–2 seconds)
import v8 from 'v8';
import fs from 'fs';

process.on('SIGUSR2', () => {
  const timestamp = new Date().toISOString().replace(/:/g, '-');
  const filename = `heap-${timestamp}.heapsnapshot`;
  const stream = v8.writeHeapSnapshot(filename);
  logger.info({ filename }, 'Heap snapshot written');
});

// Trigger: kill -SIGUSR2 $(pgrep -f your-indexer)

bash

# Take two snapshots 5 minutes apart
kill -SIGUSR2 $PID  # snapshot 1
sleep 300
kill -SIGUSR2 $PID  # snapshot 2

# Open Chrome DevTools → Memory tab → Load both snapshots
# Use "Comparison" view to see what objects grew between snapshots

Common memory leak patterns in Node.js:

javascript

// LEAK 1: EventEmitter listener accumulation
class TransactionProcessor extends EventEmitter {
  processBlock(block) {
    block.transactions.forEach(tx => {
      this.on('validate', () => tx.validate());  // ← listener added per transaction
      // Never removed! Listener count grows with each transaction
    });
  }
}
// FIX: use this.once() or explicitly removeListener()

// LEAK 2: Closure retaining large objects
function createHandler(largeBuffer) {
  return function handler(req) {
    // largeBuffer is in scope → never GC'd while handler exists
    return process(req, largeBuffer);
  };
}
// FIX: pass largeBuffer as parameter, don't close over it

// LEAK 3: Map/Set growing without bound
const requestCache = new Map();
app.get('/api/payment/:id', (req, res) => {
  requestCache.set(req.params.id, req);  // ← never evicted!
  // ...
});
// FIX: use LRU cache with bounded size, or TTL

`auto_explain` for Node.js: `async_hooks` + Performance Timing

The equivalent of PostgreSQL's auto_explain for Node.js: automatically log slow async operations above a threshold.

javascript

import { createHook, executionAsyncId } from 'node:async_hooks';
import { performance } from 'node:perf_hooks';

const asyncStartTimes = new Map();
const SLOW_THRESHOLD_MS = 100;

const hook = createHook({
  init(asyncId, type, triggerAsyncId) {
    asyncStartTimes.set(asyncId, { type, start: performance.now() });
  },
  destroy(asyncId) {
    const entry = asyncStartTimes.get(asyncId);
    if (!entry) return;
    const duration = performance.now() - entry.start;
    asyncStartTimes.delete(asyncId);

    if (duration > SLOW_THRESHOLD_MS) {
      logger.warn({ asyncType: entry.type, durationMs: duration.toFixed(2) },
        'Slow async operation detected');
    }
  }
});

hook.enable();
// Produces logs like:
// {"asyncType":"TCPWRAP","durationMs":"234.12"} — slow TCP socket
// {"asyncType":"FSREQCALLBACK","durationMs":"145.88"} — slow file operation
// {"asyncType":"Timeout","durationMs":"312.44"} — setTimeout fired late (event loop lag)

The Production Incident: Undetected Memory Leak via Event Listener

Context: A blockchain indexer WebSocket subscription service. Engineers received an alert: process memory at 94% of limit after 6 hours of operation. Restart fixed it temporarily.

Diagnosis with heap snapshot comparison:

Snapshot 1 (at startup): 180MB heap Snapshot 2 (6 hours later): 1.8GB heap

Chrome DevTools comparison view showed:

text

Object type          Count (snapshot 1)  Count (snapshot 2)  Delta
EventEmitter          42                  183,492             +183,450
Function              8,240               1,240,830           +1,232,590

183,000 EventEmitter instances — one per WebSocket subscriber (12,000 subscribers × ~15 listeners each = 180,000 accumulated listener objects that were never cleaned up).

The broken code:

javascript

// WebSocket subscription handler
wsServer.on('connection', (ws) => {
  // Register listener on shared blockEventBus for this subscriber
  blockEventBus.on('new-block', (block) => {
    ws.send(JSON.stringify(block));   // ← closure captures 'ws'
  });

  ws.on('close', () => {
    // ← listener NEVER REMOVED from blockEventBus
    // ws is closed but the listener still exists
    // blockEventBus still holds reference to the listener
    // which holds reference to 'ws'
    // Memory leak: closed WebSocket + listener permanently retained
  });
});

The fix:

javascript

wsServer.on('connection', (ws) => {
  const handler = (block) => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(JSON.stringify(block));
    }
  };

  blockEventBus.on('new-block', handler);

  ws.on('close', () => {
    blockEventBus.off('new-block', handler);  // ← explicitly remove listener
    // handler (and its closure over 'ws') can now be GC'd
  });
});

After the fix: memory stabilized at 220MB regardless of subscriber count or runtime duration.

Summary

Concept	Key Takeaway
Pino	5x faster than Winston. Structured JSON logging. Use child loggers for request context.
Prometheus + prom-client	Counter/Histogram/Gauge. Export `/metrics`. Alert on ELU, GC duration, pool utilization.
OpenTelemetry	Auto-instruments HTTP, pg, Kafka. Use `startSpan` for custom instrumentation.
ELU gauge	First metric to check during latency incident. > 0.85 = investigate CPU hotspot.
`--cpu-prof`	Generate V8 CPU profile. Open in Chrome DevTools. Identify hot functions.
Flame graph	X axis = CPU time. Wide plateau at top = hot spot. Narrow spike = expected depth.
`clinic doctor`	Diagnoses event loop delay, memory leaks, CPU saturation. First tool in the diagnostic workflow.
`clinic flame`	CPU flamegraph with V8 + native frames. Identifies exact function consuming CPU.
`clinic bubbleprof`	Async operation visualization. Shows where time is waiting vs executing.
Heap snapshots	Two snapshots + comparison view = pinpoint leaked object type. `kill -SIGUSR2` to trigger.
EventEmitter leak	Always `off()` listeners when the subscriber closes. Most common Node.js memory leak.

You can now diagnose performance problems with precision. Module 13 covers the connection layer — the pool configurations and process management patterns that determine how many requests your service can actually handle concurrently.

Next: Module 13 — Advanced Connection Pooling & Process Management →

Knowledge Check

When analyzing a flame graph generated from a Node.js CPU profile, what does a "wide plateau" at the top of a stack represent?

Which diagnostic tool from the clinic.js toolchain is best suited for visualizing where an application is spending time *waiting* for asynchronous operations (like database queries or HTTP calls) rather than executing CPU instructions?

What is identified as one of the most common causes of memory leaks in Node.js applications?

Test your knowledge with more question sets

PreviousModule A-12: Enterprise Architecture for State-Heavy Systems: DDD & Clean Architecture Next Module A-14: Advanced Connection Pooling & Process Management

Discussion

Join the discussion

Loading comments...

Module 12 — Production Observability, Performance Profiling & Flame Graphs

The Three Pillars: Logs, Metrics, Traces

Pino: The Fastest Structured Logger

Prometheus Metrics with prom-client

OpenTelemetry: Distributed Traces

Event Loop Utilization: The First Metric to Check

Generating V8 CPU Profiles

Method 1: --cpu-prof Flag

Method 2: V8 Profiler API

Reading Flame Graphs

The clinic.js Diagnostic Toolchain

clinic doctor: What Is Wrong?

clinic flame: Where Is the CPU Going?

clinic bubbleprof: Where Is the Async Time Going?

Heap Snapshots: Finding Memory Leaks

auto_explain for Node.js: async_hooks + Performance Timing

The Production Incident: Undetected Memory Leak via Event Listener

Summary

Test your knowledge with more question sets

Discussion

Prometheus Metrics with `prom-client`

Method 1: `--cpu-prof` Flag

The `clinic.js` Diagnostic Toolchain

`clinic doctor`: What Is Wrong?

`clinic flame`: Where Is the CPU Going?

`clinic bubbleprof`: Where Is the Async Time Going?

`auto_explain` for Node.js: `async_hooks` + Performance Timing