Database Overview
The system uses three storage layers. Each was chosen for a specific access pattern. This document explains the decision framework. For full schemas, see the dedicated docs.
| Detail | Where |
|---|---|
| PostgreSQL schema — all tables | postgres.md |
| MongoDB collections — all collections | mongo.md |
| SQL ER diagrams | entity-relations-sql.md |
| NoSQL collection cross-references | entity-relations-nosql.md |
Which Database for What
Rule of thumb:
- Relational data with counts, joins, or billing → PostgreSQL
- Variable-structure metadata, logs, append-only operational data → MongoDB
- Large binary files or large text blobs (PDFs, full article content, scraped text) → S3
| Data | Store | Reason |
|---|---|---|
| Tenants, users, campaigns, billing | PostgreSQL | Relational, transactional, foreign keys matter |
| Strategies, goals, deliverables | PostgreSQL | Needs ACID guarantees across status transitions |
| Activities, activity runs | PostgreSQL | State machine with strict transitions; must be counted and queried |
LLM cost audit (llm_calls) | PostgreSQL | Append-only, needs aggregation queries for cost reporting and billing |
Tool call summaries (tool_calls) | PostgreSQL | Need to be counted per activity, joined to runs |
| Channels, channel scores | PostgreSQL | Relational (FK from blog_posts, social_posts); score deltas need querying |
| Blog/social requests and entities | PostgreSQL | Status machine, FK relationships, approval flow |
| Backlinks | PostgreSQL | Needs counting, filtering by status, joining to activities |
| Leads, lead activities | PostgreSQL | CRM data — needs filtering, sorting, assignment |
| Paid campaigns, search terms, classifications | PostgreSQL | Needs HITL review workflow with status, push tracking |
| Sessions, agent configs | PostgreSQL | Relational config; session expiry queries |
| Contracts (acceptance record) | PostgreSQL | Needs foreign key to users, queried in auth middleware; PDF key stored here |
| Skills content | MongoDB | Frequently updated Markdown; needs full-text search for skill discovery |
| Agent output streaming buffer | MongoDB | Write-heavy small deltas, variable structure, TTL cleanup |
| Activity/event logs | MongoDB | High-volume append, small payloads, TTL cleanup |
| Audit logs | MongoDB | Compliance; before/after diffs are arbitrary small JSON |
| Tool call full payloads | MongoDB | JSONB varies per integration; typically small, 90d TTL |
| Deliverable content (blog posts, ad copy) | S3 | Large text blobs (multi-KB Markdown); served via pre-signed URL |
| Contract PDFs | S3 | Binary PDF; immutable legal record; Object Lock (WORM) |
| Onboarding research | S3 | Large scraped text (50–200 KB per field) |
| Strategy documents | S3 | Long-form Markdown narrative (2,000–10,000 words) |
Split Pattern: PostgreSQL + MongoDB + S3
Large content is split across all three stores. PostgreSQL holds structured metadata; MongoDB holds variable-structure metadata and operational logs; S3 holds the actual content bytes.
| Entity | PostgreSQL holds | MongoDB holds | S3 holds |
|---|---|---|---|
| Activity output | Status, output_ref (MongoDB ObjectId) | Metadata + s3Key | Full content (Markdown/JSON) |
| Blog post | Metadata (status, channel, word count) | — | Full article Markdown (via output_ref → activity_outputs.s3Key) |
| Social post | Metadata (status, platform) | — | Post copy + hashtags (via output_ref → activity_outputs.s3Key) |
| Onboarding run | Status, who reviewed, context_file_id | S3 key pointers (onboarding_research) | Scraped text, competitor research, context file |
| Strategy | Version, status, dates, approval | S3 key pointer (strategies_content) | Full strategy Markdown narrative |
| Contract | Acceptance metadata, pdf_s3_key | — | Generated PDF binary |
| Tool call | Summary (name, method, status, duration) | Full payloads (tool_call_logs) | — |
Access pattern for large content:
- Fetch MongoDB or PostgreSQL metadata record (fast, indexed)
- Extract the
s3Key - Call
s3.getSignedUrl(s3Key, { expiresIn: 900 })for a 15-minute pre-signed URL - Return URL to client; client fetches content directly from S3
Key Relationships (PostgreSQL)
tenants
└── strategies (one active at a time)
└── goals (multiple per strategy)
└── deliverables (multiple per goal)
└── deliverable_periods (one per month)
└── activities (pipeline steps)
└── activity_runs (execution attempts)
└── llm_calls (cost audit)
└── tool_calls (integration audit)
└── [output_ref → MongoDB activity_outputs]
tenants
└── channels (OAuth-connected platforms)
└── blog_requests → blog_posts (CMS content)
└── social_media_requests → social_posts (social content)
└── channel_scores (monthly health scores)
└── ad_campaigns → ad_keywords
→ search_term_reports → search_term_classifications
→ ad_optimization_recommendations
tenants
└── leads → lead_activities
└── backlinks
└── agent_configs
└── sessions
└── recurring_task_templates (copied from global templates)
└── activity_templates (copied from global templates)Base Record Fields
All records — in both PostgreSQL and MongoDB — follow a standard set of base fields. See PostgreSQL Schema → Base Record Fields for the Prisma schema composition pattern.
Naming conventions (PostgreSQL):
- All timestamp columns use
_onsuffix:created_on,updated_on,deleted_on - FK columns for user references use
_bysuffix:created_by,reviewed_by - External IDs use
external_prefix:external_campaign_id,external_keyword_id ref_idis the ULID, used in APIs/URLs.id(UUID) is used only for internal FK joins.
Engines Summary
| PostgreSQL 16 | MongoDB 7 | |
|---|---|---|
| ORM/ODM | Prisma ORM | Mongoose |
| Package | providers/provider-db/ | providers/provider-db/ |
| Connection | POSTGRES_URL env | MONGO_URL env |
| Multi-tenancy | tenant_id column on every table | tenantId field on every document |
| Soft deletes | deleted_on IS NULL filter | { deletedOn: null } filter |
| Base fields | Composed via baseFields, mutableFields, softDeleteFields | BaseDocument interface |
| Local dev | Docker Compose postgres:16 | Docker Compose mongo:7 |