Structured JSON diagnostic reports at the exact moment of failure — integrating into SRE pipelines without the overhead of full core dump analysis.
Module 20 — Automated Post-Mortem Diagnostics: process.report
What this module covers: When a Node.js process crashes with an uncaught exception, OOM kill, or fatal error, you typically have one chance to capture diagnostic information before the process terminates. process.report generates a structured JSON diagnostic report containing the V8 heap state, libuv handle and request queues, native C++ call stacks, environment variables, and system information — captured at the exact moment of failure. This module covers configuring diagnostic reports for production, integrating report generation into SRE alert pipelines, and reading the output to diagnose crashes that are otherwise invisible.
What a Diagnostic Report Contains
A process.report output is a JSON file with these sections:
json{ "header": { "reportVersion": 3, "event": "OOMError", "trigger": "OutOfMemory", "filename": "report.20260517.143022.json", "dumpEventTime": "2026-05-17T14:30:22.441Z", "processId": 12847, "cwd": "/app", "commandLine": ["node", "--max-old-space-size=2048", "dist/app.js"], "nodeVersion": "v22.3.0", "release": { "name": "node", "lts": "Jod" } }, "javascriptStack": { "message": "JavaScript heap out of memory", "stack": "FATAL ERROR: Reached heap limit Allocation failed..." }, "nativeStack": [ "#0 0x10f3c node::MakeCallback", "#1 0x23a1 uv__io_poll", "..." ], "javascriptHeap": { "totalMemory": 2147483648, "usedMemory": 2145001472, "externalMemory": 8192000, "heapSpaces": { "new_space": { "memorySize": 33554432, "committedMemory": 33521664, "usedMemory": 33501200 }, "old_space": { "memorySize": 2097152000, "committedMemory": 2097152000, "usedMemory": 2095874000 } } }, "uvthreadResourceUsage": { "userCpuSeconds": 847.23, "kernelCpuSeconds": 12.44 }, "libuv": [ { "type": "tcp", "address": "0.0.0.0", "port": 3000, "fd": 10, "is_active": true, "is_referenced": true }, { "type": "tcp", "address": "db.internal", "port": 5432, "fd": 22, "is_active": true, "sends_size": 0 } ], "workers": [], "environmentVariables": { "DATABASE_URL": "REDACTED", "NODE_ENV": "production", "UV_THREADPOOL_SIZE": "16" }, "resourceUsage": { "rss": 2478080000, "heapTotal": 2147483648, "heapUsed": 2145001472 } }
In one file, you have: the exact error, the V8 heap state at failure time, every open file descriptor (TCP connections, sockets), environment variables, and CPU usage — without needing to attach a debugger or reproduce the crash.
Configuration: Triggering Reports Automatically
javascript// Configure automatic report generation at application startup // Report on any fatal error (OOM, uncaught exception, SIGTERM) process.report.reportOnFatalError = true; // Report on uncaught exceptions (before process exits) process.report.reportOnUncaughtException = true; // Report on SIGUSR2 signal (manual trigger without process restart) process.report.reportOnSignal = true; process.report.signal = 'SIGUSR2'; // default // Where to write reports process.report.directory = '/app/diagnostic-reports'; process.report.filename = 'report.{date}.{time}.{pid}.json'; // Variables available: {date}, {time}, {pid}, {tid}, {hostname}, {timestamp}
bash# Trigger a report manually without restarting kill -SIGUSR2 $(pgrep -f "node dist/app.js") # Generates: /app/diagnostic-reports/report.20260517.143022.12847.json # Process continues running — zero impact on production traffic
Kubernetes Integration: Report Before OOM Kill
When Kubernetes sends SIGTERM before OOM killing a pod, trigger a report in the SIGTERM handler:
javascriptprocess.on('SIGTERM', async () => { // Generate diagnostic report BEFORE shutdown const reportFilename = process.report.writeReport(); logger.info({ reportFilename }, 'Diagnostic report written before shutdown'); // Copy report to persistent storage (pod storage is ephemeral) await uploadToS3(reportFilename, `reports/${process.pid}-${Date.now()}.json`); // Then graceful shutdown await gracefulShutdown('SIGTERM'); });
Reading a Diagnostic Report: The Key Sections
Diagnosing OOM from a Report
javascript// From the report's javascriptHeap section: { "heapSpaces": { "old_space": { "memorySize": 2147483648, // 2GB allocated "usedMemory": 2145001472, // 2GB used = 99.9% full → OOM }, "new_space": { "usedMemory": 33501200, // new space mostly full too } } } // Diagnosis: Old Space completely full. Likely cause: memory leak. // Next step: compare with two heap snapshots (Module 12 runbook)
Diagnosing Resource Leaks from libuv Section
javascript// From the report's libuv section: [ { "type": "tcp", "address": "10.0.0.45", "port": 5432, "fd": 22 }, { "type": "tcp", "address": "10.0.0.45", "port": 5432, "fd": 23 }, // ... 487 more entries for the same host:port combination ... ] // 489 TCP connections to db.internal:5432 // Database pool configured for max: 50 // → Connection leak: connections opened but never returned to pool
Diagnosing Event Loop Stall
javascript// From the report's header + uvthreadResourceUsage: { "header": { "event": "Signal", "trigger": "SIGUSR2" }, "uvthreadResourceUsage": { "userCpuSeconds": 847, "kernelCpuSeconds": 0.1 // very low kernel time } } // High user CPU, very low kernel CPU: // Node.js is burning CPU in JavaScript (not I/O) // Combined with ELU > 0.95: pure JavaScript computation blocking the loop // Next step: clinic flame to identify the function
SRE Pipeline Integration
Automatic Report Upload on Crash
javascript// Configure crash reporting pipeline import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'; import { createReadStream } from 'fs'; const s3 = new S3Client({ region: 'ap-south-1' }); async function uploadDiagnosticReport(filename) { const key = `diagnostic-reports/${process.env.SERVICE_NAME}/${Date.now()}-${filename}`; await s3.send(new PutObjectCommand({ Bucket: 'ops-diagnostic-reports', Key: key, Body: createReadStream(filename), ContentType: 'application/json', Metadata: { 'service': process.env.SERVICE_NAME, 'pod': process.env.HOSTNAME, 'node-version': process.version, }, })); logger.info({ s3Key: key }, 'Diagnostic report uploaded to S3'); return key; } // Hook into process events process.on('uncaughtException', async (err) => { logger.error({ err }, 'Uncaught exception — generating diagnostic report'); const filename = process.report.writeReport(); await uploadDiagnosticReport(filename).catch(() => {}); // Don't prevent normal exit — just ensure report is uploaded first });
Alert on Crash with Report Link
javascript// Send a PagerDuty/Slack alert with the report location async function alertWithReport(err, reportS3Key) { await fetch(process.env.PAGERDUTY_WEBHOOK, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ routing_key: process.env.PAGERDUTY_KEY, event_action: 'trigger', payload: { summary: `Node.js crash: ${err.message}`, severity: 'critical', source: process.env.HOSTNAME, custom_details: { diagnostic_report: `s3://ops-diagnostic-reports/${reportS3Key}`, node_version: process.version, service: process.env.SERVICE_NAME, heap_used_mb: Math.round(process.memoryUsage().heapUsed / 1024 / 1024), } } }) }); }
Programmatic Report Inspection
javascript// Read and parse a report for automated analysis import { readFileSync } from 'fs'; function analyzeReport(reportPath) { const report = JSON.parse(readFileSync(reportPath, 'utf8')); const analysis = { crashReason: report.header.event, heapUsedPct: Math.round( report.javascriptHeap.usedMemory / report.javascriptHeap.totalMemory * 100 ), openTcpConnections: report.libuv.filter(h => h.type === 'tcp').length, openFileHandles: report.libuv.filter(h => h.type === 'fs_event' || h.type === 'pipe').length, cpuTimeSeconds: report.uvthreadResourceUsage.userCpuSeconds, }; // Automated triage if (analysis.heapUsedPct > 95) analysis.likelyCause = 'OOM / memory leak'; if (analysis.openTcpConnections > 200) analysis.likelyCause = 'Connection leak'; if (analysis.cpuTimeSeconds > 3600) analysis.likelyCause = 'Long-running process with CPU issue'; return analysis; } // Use in CI/CD to detect memory leaks during load tests const analysis = analyzeReport('./diagnostic-reports/latest.json'); if (analysis.heapUsedPct > 80) { throw new Error(`Memory leak detected: heap at ${analysis.heapUsedPct}%`); }
Course Complete
This module closes the course. The full arc from Module 0 to Module 20:
Foundation (0–5): Non-blocking architecture, V8 internals, event loop mechanics, kernel I/O, backpressure, off-heap buffers.
Scaling (6–9): Cluster/workers/IPC, routing engines, Modulith architecture, microservice extraction.
Distributed Systems (10–11): gRPC, Kafka, event sourcing, DDD, CQRS, Clean Architecture.
Operations (12–13): Observability, flame graphs, connection pooling, PM2, graceful shutdown.
Advanced (14–20): Edge isolates, resiliency runbooks, Permission Model, SEA/snapshots, Rust N-API, Undici/Web Crypto, automated diagnostics.
The system you can now build: a high-throughput Node.js blockchain indexer or payment gateway that uses kernel-level I/O efficiently, scales across all available CPU cores with minimal IPC overhead, maintains clean domain boundaries with CQRS and DDD, defends against supply chain attacks and ReDoS, and generates structured diagnostics at the exact moment of any failure.
Summary
| Concept | Key Takeaway |
|---|---|
process.report | Structured JSON at the moment of failure. V8 heap, libuv handles, call stacks, env vars. |
reportOnFatalError | Captures report on OOM, segfault, other fatal errors. No code change needed after configuration. |
reportOnSignal | SIGUSR2 triggers a report in a running process. Zero impact on traffic. |
writeReport() | Programmatic report generation in signal handlers and error handlers. |
| Heap section | usedMemory / totalMemory > 95% = OOM. Old Space full = memory leak. |
| libuv section | 400 TCP connections to one host = connection leak. Unexpected handles = resource leak. |
| CPU section | High user CPU, low kernel CPU = JavaScript blocking the event loop. |
| S3 upload before exit | Persist reports from ephemeral Kubernetes pods before they disappear. |
| Automated analysis | Parse reports in CI/CD load tests to detect memory leaks before production. |