Database Overview

The system uses three storage layers. Each was chosen for a specific access pattern. This document explains the decision framework. For full schemas, see the dedicated docs.

Detail	Where
PostgreSQL schema — all tables	postgres.md
MongoDB collections — all collections	mongo.md
SQL ER diagrams	entity-relations-sql.md
NoSQL collection cross-references	entity-relations-nosql.md

Which Database for What

Rule of thumb:

Relational data with counts, joins, or billing → PostgreSQL
Variable-structure metadata, logs, append-only operational data → MongoDB
Large binary files or large text blobs (PDFs, full article content, scraped text) → S3

Data	Store	Reason
Tenants, users, campaigns, billing	PostgreSQL	Relational, transactional, foreign keys matter
Strategies, goals, deliverables	PostgreSQL	Needs ACID guarantees across status transitions
Activities, activity runs	PostgreSQL	State machine with strict transitions; must be counted and queried
LLM cost audit (`llm_calls`)	PostgreSQL	Append-only, needs aggregation queries for cost reporting and billing
Tool call summaries (`tool_calls`)	PostgreSQL	Need to be counted per activity, joined to runs
Channels, channel scores	PostgreSQL	Relational (FK from blog_posts, social_posts); score deltas need querying
Blog/social requests and entities	PostgreSQL	Status machine, FK relationships, approval flow
Backlinks	PostgreSQL	Needs counting, filtering by status, joining to activities
Leads, lead activities	PostgreSQL	CRM data — needs filtering, sorting, assignment
Paid campaigns, search terms, classifications	PostgreSQL	Needs HITL review workflow with status, push tracking
Sessions, agent configs	PostgreSQL	Relational config; session expiry queries
Contracts (acceptance record)	PostgreSQL	Needs foreign key to users, queried in auth middleware; PDF key stored here
Skills content	MongoDB	Frequently updated Markdown; needs full-text search for skill discovery
Agent output streaming buffer	MongoDB	Write-heavy small deltas, variable structure, TTL cleanup
Activity/event logs	MongoDB	High-volume append, small payloads, TTL cleanup
Audit logs	MongoDB	Compliance; `before`/`after` diffs are arbitrary small JSON
Tool call full payloads	MongoDB	JSONB varies per integration; typically small, 90d TTL
Deliverable content (blog posts, ad copy)	S3	Large text blobs (multi-KB Markdown); served via pre-signed URL
Contract PDFs	S3	Binary PDF; immutable legal record; Object Lock (WORM)
Onboarding research	S3	Large scraped text (50–200 KB per field)
Strategy documents	S3	Long-form Markdown narrative (2,000–10,000 words)

Split Pattern: PostgreSQL + MongoDB + S3

Large content is split across all three stores. PostgreSQL holds structured metadata; MongoDB holds variable-structure metadata and operational logs; S3 holds the actual content bytes.

Entity	PostgreSQL holds	MongoDB holds	S3 holds
Activity output	Status, `output_ref` (MongoDB ObjectId)	Metadata + `s3Key`	Full content (Markdown/JSON)
Blog post	Metadata (status, channel, word count)	—	Full article Markdown (via `output_ref` → `activity_outputs.s3Key`)
Social post	Metadata (status, platform)	—	Post copy + hashtags (via `output_ref` → `activity_outputs.s3Key`)
Onboarding run	Status, who reviewed, `context_file_id`	S3 key pointers (`onboarding_research`)	Scraped text, competitor research, context file
Strategy	Version, status, dates, approval	S3 key pointer (`strategies_content`)	Full strategy Markdown narrative
Contract	Acceptance metadata, `pdf_s3_key`	—	Generated PDF binary
Tool call	Summary (name, method, status, duration)	Full payloads (`tool_call_logs`)	—

Access pattern for large content:

Fetch MongoDB or PostgreSQL metadata record (fast, indexed)
Extract the s3Key
Call s3.getSignedUrl(s3Key, { expiresIn: 900 }) for a 15-minute pre-signed URL
Return URL to client; client fetches content directly from S3

Key Relationships (PostgreSQL)


tenants
  └── strategies (one active at a time)
        └── goals (multiple per strategy)
              └── deliverables (multiple per goal)
                    └── deliverable_periods (one per month)
                          └── activities (pipeline steps)
                                └── activity_runs (execution attempts)
                                      └── llm_calls (cost audit)
                                      └── tool_calls (integration audit)
                                └── [output_ref → MongoDB activity_outputs]

tenants
  └── channels (OAuth-connected platforms)
        └── blog_requests → blog_posts (CMS content)
        └── social_media_requests → social_posts (social content)
        └── channel_scores (monthly health scores)
        └── ad_campaigns → ad_keywords
                        → search_term_reports → search_term_classifications
                        → ad_optimization_recommendations

tenants
  └── leads → lead_activities
  └── backlinks
  └── agent_configs
  └── sessions
  └── recurring_task_templates (copied from global templates)
  └── activity_templates (copied from global templates)

Base Record Fields

All records — in both PostgreSQL and MongoDB — follow a standard set of base fields. See PostgreSQL Schema → Base Record Fields for the Prisma schema composition pattern.

Naming conventions (PostgreSQL):

All timestamp columns use _on suffix: created_on, updated_on, deleted_on
FK columns for user references use _by suffix: created_by, reviewed_by
External IDs use external_ prefix: external_campaign_id, external_keyword_id
ref_id is the ULID, used in APIs/URLs. id (UUID) is used only for internal FK joins.

Engines Summary

	PostgreSQL 16	MongoDB 7
ORM/ODM	Prisma ORM	Mongoose
Package	`providers/provider-db/`	`providers/provider-db/`
Connection	`POSTGRES_URL` env	`MONGO_URL` env
Multi-tenancy	`tenant_id` column on every table	`tenantId` field on every document
Soft deletes	`deleted_on IS NULL` filter	`{ deletedOn: null }` filter
Base fields	Composed via `baseFields`, `mutableFields`, `softDeleteFields`	`BaseDocument` interface
Local dev	Docker Compose `postgres:16`	Docker Compose `mongo:7`