Migration Architecture

Folder structure


migration/
├── run.ts                        # Orchestrator: arg parsing, step loop, summary table
├── discover.ts                   # Discovery scripts (read-only MongoDB inspection)
├── package.json
├── tsconfig.json
├── .env
├── config/
│   ├── migration.config.ts       # Master config: source DB, batchSize, roleMap, steps[]
│   └── maps/
│       ├── 01-offering.map.ts
│       ├── 02-tenant.map.ts
│       ├── 03-user.map.ts
│       ├── 04-member.map.ts
│       ├── 05-subscription.map.ts
│       └── 06-invoice.map.ts
└── lib/
    ├── types.ts                  # MigrationConfig, CollectionMap, RelatedWriter, MigrationContext, StepResult
    ├── runner.ts                 # runStep(): setup → iterate batches → write → relatedWriters
    ├── batch.ts                  # iterateBatches(): cursor-based iteration, default batchSize 500
    ├── mongo.ts                  # connectSourceMongo() / closeSourceMongo()
    └── logger.ts                 # log.info / log.warn / log.error / log.success / progress

CollectionMap pattern

Every map file in config/maps/ exports a single map object that conforms to the CollectionMap<TDoc, TRow> interface defined in lib/types.ts. The fields are:

Field	Type	Required	Purpose
`sourceCollection`	`string`	Yes	MongoDB collection name to read from
`description`	`string`	Yes	Human-readable label shown in the progress output
`batchSize`	`number`	No	Per-step override; falls back to the global default of 500
`setup`	`async (db, mongoDb, ctx) => void`	No	Runs once before iteration starts. Used to pre-load lookup tables (plan IDs, valid tenant IDs, etc.) into `ctx` so that `mapDoc` can reference them without issuing per-document queries.
`mapDoc`	`(doc, ctx) => TRow \| null`	Yes	Maps one raw MongoDB document to a single Prisma row object. Return `null` to skip the document. Null returns are counted as “skipped” in the step summary.
`write`	`async (batch, db, ctx) => void`	Yes	Receives the array of non-null rows produced by `mapDoc` for the current batch and persists them (typically `db.someTable.createMany({ data: batch, skipDuplicates: true })`).
`relatedWriters`	`RelatedWriter[]`	No	Additional writers that consume the same raw batch to populate other target tables. See below.

RelatedWriter

A RelatedWriter is used when a single source document should produce rows in more than one target table. Unlike mapDoc + write, a relatedWriter receives the full array of raw MongoDB documents for the batch — not the already-mapped rows — so it can apply its own mapping logic independently.

Common uses:

User documents produce both user rows (main pipeline) and account rows (relatedWriter).
Offering documents expand into region rows and plan rows via separate relatedWriters.
Tenant documents expand into competitor rows via a relatedWriter.

The StepResult returned by runStep includes a related field that aggregates the count of rows written by all relatedWriters for that step. This is separate from the processed count, which only reflects the main write call.

When mapDoc intentionally returns null for every document (as in the member step), the processed count will be 0 and the skipped count will equal the total document count. All meaningful output comes from the relatedWriter. This is by design and not an error.

runner.ts flow

runStep(step, config, dryRun) in lib/runner.ts executes one migration step end-to-end:

Setup — If the map defines a setup() function, call it once with the Prisma client, the MongoDB Db handle, and the shared MigrationContext. This is where lookup tables are populated.
Count — Issue a countDocuments() call on the source collection to establish the total for the progress display.
Iterate batches — Call iterateBatches() from lib/batch.ts, which opens a MongoDB cursor and yields arrays of documents up to batchSize (default 500, overridable per map).
Map — For each document in the batch, call mapDoc(doc, ctx). Collect non-null results into a rows array; increment the skipped counter for each null.
Write — If not in dry-run mode, call write(rows, db, ctx) with the non-null rows.
RelatedWriters — For each entry in relatedWriters, call its write(rawBatch, db, ctx) with the original unfiltered batch. In dry-run mode these calls are also skipped.
Return — Return a StepResult containing { processed, skipped, related, durationMs, error? }. The orchestrator in run.ts accumulates these into the final summary table.

Dynamic map loading

Map files are loaded at runtime by run.ts using import() with a file:// URL. This is required on Windows because import('/absolute/path') with a bare drive letter (E:\...) fails with a protocol error. The correct pattern:


import { pathToFileURL } from "url";
 
const fileUrl = pathToFileURL(fullPath).href; // "E:\..." → "file:///E:/..."
const mapModule = await import(fileUrl);
const map: CollectionMap = mapModule.map;

Each map file must use a named export map (not a default export) for this pattern to work.

Region code strategy

MongoDB stores region identifiers as lowercase slugs (in, us, ae, dubai). The PostgreSQL seed database uses uppercase two-letter codes (IN, US, ME, DU). Rather than maintaining a translation table, the offering map upserts regions by their full name, then sets the code field to the MongoDB slug. After the migration runs, all region codes in PostgreSQL match the MongoDB slugs exactly, so the subscription plan-lookup key (regionCode:planSlug) works without any aliasing.

Seed region name to MongoDB slug mapping:

Seed name	MongoDB slug	Notes
India	`in`
USA	`us`
Middle East	`ae`
Dubai	`dubai`
Global	`all`	Synthetic — created for `all:trial` legacy subscriptions only

The “Global” region does not exist in the MongoDB Offering document. It is created by the offering map solely to hold the synthetic trial plan referenced by legacy subscriptions.