Module A-10·23 min read

When to decouple — the outbox pattern, datastore split strategies, and transitioning a write-heavy module out of the monolith without data loss.

Module 9 — Pragmatic Microservice Deconstruction: Splitting Ingestion from Analytics

What this module covers: The Modulith from Module 8 is the right starting architecture. But eventually a specific module genuinely needs independent scaling, geographic distribution, or a different runtime. Extracting a module incorrectly causes data loss, split-brain state, and deployment coupling that defeats the purpose of the split. This module covers how to identify the correct seam to split, the outbox pattern for atomically publishing events during extraction, datastore split strategies, and the operational reality of running two services where there was one.


When to Split: The Three Legitimate Signals

The decision to split a module out of the Modulith should be data-driven. There are exactly three legitimate reasons:

Signal 1: Independent scaling requirement

A module needs 10× the compute of others. The analytics module needs 32 CPU cores for aggregation while ingestion needs 4. Scaling the whole Modulith to 32 cores wastes 28 cores of ingestion capacity.

javascript
// Measure per-module CPU cost const { cpuUsage } = process; const before = cpuUsage(); await analyticsModule.processTransaction(tx); const usage = cpuUsage(before); console.log(`Analytics CPU: ${usage.user}μs user, ${usage.system}μs system`);

If analytics consistently uses 15× more CPU than ingestion per transaction, that's a scaling signal.

Signal 2: Different availability requirements

Ingestion must be 99.99% available (data loss on downtime). Analytics can tolerate 99.9% (reports are slightly stale during an outage). Running them together means analytics bugs can take down ingestion — an unacceptable risk profile.

Signal 3: Genuine team autonomy need

A separate team owns analytics and deploys 10 times per day. Coupling their deployment to ingestion (which deploys once a week) creates a deployment bottleneck. Conway's Law applies.

Do NOT split for:

  • "Microservices are the modern way" (cargo cult)
  • The module is large (size is not a reason for distribution)
  • You want to use a different language (use a native addon instead)
  • "It might need to scale later" (YAGNI — split when the signal is real)

Identifying the Split Seam

The correct seam for splitting is where write contention and read contention diverge.

For a blockchain indexer:

Write path:    incoming block → parse → validate → INSERT into transactions table
Read path:     API queries → SELECT aggregations, lookups, analytics

Both paths hit the same PostgreSQL primary.
At 50K writes/sec + 10K reads/sec, the primary's write throughput is the constraint.
Reads on the primary compete with writes for buffer pool, WAL, and I/O.

The seam: separate the write service from the read/analytics service. The write service owns the primary. The analytics service reads from a read replica or a separate projection store.

Before split:
  [Modulith] → writes transactions → PostgreSQL primary
  [Modulith] → reads analytics ← PostgreSQL primary (competing I/O)

After split:
  [Ingestion Service] → writes transactions → PostgreSQL primary
  [Ingestion Service] → publishes events → Kafka topic
  [Analytics Service] → consumes Kafka → builds projections → PostgreSQL analytics (separate)
  [Analytics Service] → serves read API

Now writes and reads never compete. The ingestion service can sustain 50K writes/sec at full I/O. Analytics has its own PostgreSQL instance sized for reads.


The Outbox Pattern: Atomic Write + Event Publish

The most dangerous moment in a service split is when a write to the database and a publish to Kafka must be treated as a unit. If you write to the database and then publish to Kafka, a crash between the two leaves the database updated but Kafka unnotified — downstream services miss the event.

javascript
// BROKEN: non-atomic write + publish async function ingestTransaction(tx) { await db.write(tx); // step 1: DB write succeeds // CRASH HERE → Kafka never gets the event → analytics never updates await kafka.publish('transactions', tx); // step 2: may never execute }

The outbox pattern solves this by writing the event to an outbox table in the same database transaction as the business data. Atomicity is guaranteed by the database. A separate process reads the outbox and publishes to Kafka.

javascript
// Step 1: Write transaction + outbox event in ONE database transaction async function ingestTransaction(tx) { const client = await pool.connect(); try { await client.query('BEGIN'); // Write business data await client.query( 'INSERT INTO transactions (hash, block_height, sender, amount) VALUES ($1, $2, $3, $4)', [tx.hash, tx.blockHeight, tx.sender, tx.amount] ); // Write outbox event — same transaction, atomically await client.query( 'INSERT INTO outbox (event_type, payload, created_at) VALUES ($1, $2, NOW())', ['TRANSACTION_INGESTED', JSON.stringify(tx)] ); await client.query('COMMIT'); // Either BOTH succeed or BOTH fail. Never one without the other. } catch (err) { await client.query('ROLLBACK'); throw err; } finally { client.release(); } }
javascript
// Step 2: Outbox publisher — reads unprocessed events, publishes to Kafka async function outboxPublisher() { while (true) { const { rows } = await pool.query( 'SELECT id, event_type, payload FROM outbox WHERE published_at IS NULL ORDER BY id LIMIT 100 FOR UPDATE SKIP LOCKED' ); if (rows.length === 0) { await sleep(100); // no events, wait briefly continue; } // Publish to Kafka await kafka.sendBatch(rows.map(row => ({ topic: 'transactions', messages: [{ key: row.id.toString(), value: row.payload }] }))); // Mark as published await pool.query( 'UPDATE outbox SET published_at = NOW() WHERE id = ANY($1)', [rows.map(r => r.id)] ); } }

Why FOR UPDATE SKIP LOCKED: Multiple outbox publisher instances can run safely. Each grabs a batch of rows that no other publisher is currently processing. No duplicate publishes.

The at-least-once guarantee: if the publisher crashes after publishing to Kafka but before marking the rows, it will re-publish on restart. The Kafka consumer must be idempotent — processing the same event twice must be safe.


Kafka Consumer: Idempotent Processing

The analytics service consumes from the Kafka topic. Because the outbox guarantees at-least-once delivery, the consumer must handle duplicates:

javascript
import { Kafka } from 'kafkajs'; const kafka = new Kafka({ brokers: ['kafka:9092'] }); const consumer = kafka.consumer({ groupId: 'analytics-service' }); await consumer.connect(); await consumer.subscribe({ topic: 'transactions', fromBeginning: false }); await consumer.run({ autoCommit: false, // manual offset management for exactly-once processing eachMessage: async ({ topic, partition, message, heartbeat }) => { const tx = JSON.parse(message.value.toString()); // Idempotent processing: INSERT ... ON CONFLICT DO NOTHING // If we've seen this transaction ID before, skip silently const result = await analyticsPool.query( `INSERT INTO transaction_projections (tx_hash, block_height, amount, processed_at) VALUES ($1, $2, $3, NOW()) ON CONFLICT (tx_hash) DO NOTHING`, [tx.hash, tx.blockHeight, tx.amount] ); if (result.rowCount === 0) { console.log(`Duplicate event skipped: ${tx.hash}`); } // Commit offset only after successful processing await consumer.commitOffsets([{ topic, partition, offset: (parseInt(message.offset) + 1).toString() }]); // Heartbeat for long-running processing await heartbeat(); } });

Datastore Split Strategies

When splitting, you have several options for how to divide the data:

Strategy 1: Primary + Read Replica (simplest)

Ingestion Service → PostgreSQL primary (writes)
Analytics Service → PostgreSQL read replica (reads)

The read replica receives WAL from the primary and stays < 1 second behind. Analytics queries run against it without competing with writes. No separate schema required.

Limitation: analytics read volume still competes with replication I/O on the replica. Works for moderate analytics load.

Strategy 2: CQRS with separate projection store

Ingestion Service → PostgreSQL primary (normalized, OLTP)
Analytics Service → analytics PostgreSQL (denormalized projections, OLAP)
                  → Updated via Kafka consumer

The analytics database is optimized for read patterns: denormalized tables, partial indexes, materialized views. The ingestion database is optimized for write throughput: normalized, minimal indexes.

javascript
// Analytics projection table — denormalized for fast reads // Updated by Kafka consumer, not by ingestion service await analyticsPool.query(` CREATE TABLE IF NOT EXISTS daily_volume ( date DATE, total_amount NUMERIC(38, 8), tx_count BIGINT, unique_senders BIGINT, PRIMARY KEY (date) ) `); // Consumer updates projection await analyticsPool.query(` INSERT INTO daily_volume (date, total_amount, tx_count, unique_senders) VALUES (DATE($1), $2, 1, 1) ON CONFLICT (date) DO UPDATE SET total_amount = daily_volume.total_amount + $2, tx_count = daily_volume.tx_count + 1 `, [tx.timestamp, tx.amount]);

Strategy 3: Dedicated analytics store (ClickHouse/TimescaleDB)

For high-cardinality time-series analytics (transaction volume by hour by sender by network), a columnar store like ClickHouse provides 10–100× better query performance than PostgreSQL.

javascript
// Kafka consumer writing to ClickHouse via HTTP interface await fetch('http://clickhouse:8123/', { method: 'POST', body: `INSERT INTO transactions FORMAT JSONEachRow\n${JSON.stringify(tx)}\n`, });

This is the correct deployment for a production blockchain explorer where analytics queries aggregate billions of rows.


Service Contract: The Anti-Corruption Layer

When splitting, the analytics service must not depend on the ingestion service's internal data model. If ingestion changes its schema, analytics should not break.

javascript
// modules/ingestion/events.ts — the public contract (never changes) export interface TransactionIngestedEvent { eventType: 'TRANSACTION_INGESTED'; eventVersion: 1; transactionHash: string; blockHeight: number; senderAddress: string; amount: string; // string to avoid BigInt serialization issues timestamp: number; } // Anti-corruption layer in analytics: translate from ingestion's event format // to analytics' domain model function translateEvent(event: TransactionIngestedEvent): AnalyticsTransaction { return { hash: event.transactionHash, // analytics uses different field names height: event.blockHeight, sender: event.senderAddress, value: BigInt(event.amount), // analytics uses BigInt internally at: new Date(event.timestamp * 1000), }; }

Version the events (eventVersion: 1). When the ingestion event format changes, bump to eventVersion: 2 and have analytics handle both versions during the transition.


Operational Reality: Two Services

After the split, you have doubled the operational surface:

ConcernBefore (Modulith)After (Split)
Deployments1 pipeline2 pipelines
Health checks1 endpoint2 endpoints
Logs1 log stream2 log streams
Distributed tracingOptionalRequired
Schema migrations1 database2 databases
Kafka consumer lagN/AMust monitor
Outbox queue depthN/AMust monitor
Data consistencyGuaranteed (same TX)Eventual (Kafka lag)

Monitoring additions for the split:

javascript
// Monitor Kafka consumer lag (analytics falling behind ingestion) const { OFFSET_OUT_OF_RANGE } = ErrorCodes; setInterval(async () => { const offsets = await admin.fetchOffsets({ groupId: 'analytics-service', topics: ['transactions'] }); const endOffsets = await admin.fetchTopicOffsets('transactions'); for (const partition of offsets.offsets) { const end = endOffsets.find(e => e.partition === partition.partition); const lag = parseInt(end.offset) - parseInt(partition.offset); consumerLagGauge.labels(partition.partition.toString()).set(lag); if (lag > 100_000) { logger.warn(`Analytics consumer lag: ${lag} messages on partition ${partition.partition}`); } } }, 10_000);
javascript
// Monitor outbox queue depth (publish delay) setInterval(async () => { const { rows } = await pool.query( 'SELECT COUNT(*) FROM outbox WHERE published_at IS NULL' ); outboxDepthGauge.set(parseInt(rows[0].count)); }, 5_000);

Production Incident: Split-Brain During Extraction Migration

Context: A blockchain indexer mid-migration. Ingestion writes to PostgreSQL. The outbox publisher was deployed but the analytics Kafka consumer had not yet been deployed.

What happened:

The outbox table accumulated 12 million rows over 2 days. When the analytics consumer was finally deployed, it began processing 12 million events from the beginning of time. The analytics database received 12 million inserts in 4 hours — a 10× spike over normal write rate. The analytics PostgreSQL ran out of IOPS and fell behind. Queries against the analytics API returned stale data for 18 hours while the consumer caught up.

The fix:

javascript
// Option 1: Seed the analytics database from the primary before enabling the consumer // Run a one-time historical backfill query await analyticsPool.query(` INSERT INTO daily_volume (date, total_amount, tx_count) SELECT DATE(timestamp) as date, SUM(amount) as total_amount, COUNT(*) as tx_count FROM transactions -- query the primary GROUP BY DATE(timestamp) ON CONFLICT (date) DO UPDATE SET total_amount = EXCLUDED.total_amount, tx_count = EXCLUDED.tx_count `); // Option 2: Start the Kafka consumer at the current offset (skip historical events) // Only do this if the analytics DB was pre-seeded with historical data await consumer.subscribe({ topic: 'transactions', fromBeginning: false }); // fromBeginning: false = start from latest offset, not from beginning

The lesson: migrations that involve a new consumer processing a backlog must account for the resource cost of processing historical events. Pre-seed the downstream datastore before enabling the consumer and starting from the current offset.


Summary

ConceptKey Takeaway
Split signalsIndependent scale requirement, different availability, team autonomy. Not "microservices are modern."
Correct seamWhere write and read contention diverge. Separate write-heavy from read-heavy on different datastores.
Outbox patternWrite event to outbox table in same DB transaction as business data. Atomicity guaranteed.
At-least-once deliveryPublisher may re-publish after crash. Consumer must be idempotent.
FOR UPDATE SKIP LOCKEDMultiple publishers consume outbox without duplicates.
Idempotent consumerINSERT ... ON CONFLICT DO NOTHING. Safe to process same event twice.
CQRS projectionsAnalytics DB is denormalized, read-optimized. Updated via Kafka, never by ingestion service.
Anti-corruption layerVersion your events. Translate between service domains explicitly.
Operational cost2 pipelines, 2 health checks, consumer lag monitoring, outbox depth monitoring.
Historical backfillPre-seed downstream DB before enabling consumer. Never process 12M historical events on a live system.

You can now build and split distributed services. Module 10 covers how those services talk to each other at scale — gRPC vs REST, Kafka consumer group mechanics, and event sourcing for payment ledgers that need replay capability.

Next: Module 10 — High-Performance IPC: gRPC, Kafka & Event Streams →

© 2026 Jatin Jain Saraf (JJS). All rights reserved.