Database Documentation

The system uses two database engines — PostgreSQL for relational, transactional data and MongoDB for large blobs, documents, and high-volume append-only logs.

Doc	What it covers
Database Overview	Which data goes where and why — decision guide, split pattern, key relationships
PostgreSQL Schema	Full PostgreSQL schema: all tables, indexes, useful queries, security
MongoDB Collections	All MongoDB collections: schemas, indexes, TTLs, security
SQL Entity Relations	ER diagrams for all PostgreSQL tables and their relationships
NoSQL Entity Relations	MongoDB collections and their cross-references to PostgreSQL

Quick Decision Guide

PostgreSQL — if the data:

Has relationships to other records (foreign keys matter)
Needs counting, aggregation, or billing queries
Participates in ACID transactions (status state machines)
Is small-to-medium in size with a fixed schema

MongoDB — if the data:

Has variable or unpredictable structure (arbitrary JSON diffs, integration payloads)
Is an append-only log with TTL cleanup and no relational queries
Needs full-text search (skills content)
Is small-to-medium structured metadata accessed by a single FK from PostgreSQL

S3 — if the data:

Is a large binary or large text blob (PDF, multi-KB Markdown, scraped website text)
Has no query requirements — accessed only by a known key
Needs immutability guarantees (Object Lock / WORM for legal records)
Benefits from direct client access without API proxying (pre-signed URLs)

See the overview for the full decision table.

Key Architecture Patterns

Split Pattern (PostgreSQL + MongoDB + S3)

Large content is split across all three stores. PostgreSQL holds structured metadata; MongoDB holds variable-structure metadata or operational logs; S3 holds the actual content bytes.

Entity	PostgreSQL holds	MongoDB holds	S3 holds
Activity output	Status, `output_ref` pointer	Metadata + `s3Key` (`activity_outputs`)	Full content (Markdown/JSON)
Blog post	Metadata (status, channel, word count)	—	Full article Markdown (via `output_ref → activity_outputs.s3Key`)
Social post	Metadata (status, platform)	—	Post copy + hashtags (via `output_ref → activity_outputs.s3Key`)
Onboarding run	Status, `context_file_id`	S3 key pointers (`onboarding_research`)	Scraped text, research, context file
Strategy	Version, status, dates, approval	S3 key pointer (`strategies_content`)	Full strategy Markdown
Contract	Acceptance metadata, `pdf_s3_key`	—	PDF binary (Object Lock / WORM)
Tool call	Summary (name, method, status)	Full payloads (`tool_call_logs`)	—

Multi-Tenancy

All records are scoped to a tenant:

PostgreSQL: tenant_id UUID NOT NULL REFERENCES tenants(id) on every table
MongoDB: tenantId: string on every document

Queries must always include the tenant filter. Global records (e.g. global skill templates) use tenantId: 'global' in MongoDB and tenant_id IS NULL in PostgreSQL.

Identifiers

Identifier	Used for	Format
`id` (PostgreSQL)	Internal FK joins only	UUID
`ref_id` (PostgreSQL)	External APIs, URLs, webhooks	ULID (26 chars, sortable)
`_id` (MongoDB)	MongoDB internal primary key	ObjectId
`refId` (MongoDB)	External references	ULID

PostgreSQL ref_id values and MongoDB ObjectIds are stored as cross-reference pointers (e.g. output_ref, content_ref, pdf_ref).

Storage Engines

	PostgreSQL 16	MongoDB 7	S3 / MinIO
ORM/ODM/SDK	Prisma ORM	Mongoose	`@aws-sdk/client-s3`
Package	`providers/provider-db/`	`providers/provider-db/`	`providers/provider-storage/`
Connection	Via PgBouncer (transaction mode)	Direct `MONGO_URL`	`S3_ENDPOINT` + credentials
Multi-tenancy	`tenant_id` column + FK on every table	`tenantId` field on every document	`{tenantId}/` key prefix
Soft deletes	`WHERE deleted_on IS NULL`	`{ deletedOn: null }`	Object versioning + lifecycle rules
Local dev	Docker Compose `postgres:16`	Docker Compose `mongo:7`	Docker Compose `minio/minio`
Production	Managed cloud PostgreSQL	MongoDB Atlas or self-hosted	AWS S3 or DO Spaces
On-prem	Self-hosted PostgreSQL	Self-hosted MongoDB	Self-hosted MinIO