Database Schema Review — v2
Date: 2026-04-29
Reviewer: Claude (schema walk, 122 models, ~3,480 lines)
Schema: packages/db/prisma/schema.prisma
Overview
The schema is well-structured overall: strong tenant isolation (nearly every domain model cascades from Tenant), thorough inline lifecycle documentation, consistent audit trail models, and versioning for long-lived content. Financial amounts are correctly stored in smallest currency units. Indexes are generally well-placed on high-cardinality lookups.
The main concerns are correctness issues in a handful of models (missing FK relations, global unique constraints that should be per-tenant, a shared table holding personal data) and design debt accumulated from incremental growth (dual membership system, parallel contact models, JSON-array audit logs).
🔴 High — Correctness / Integrity Risks
1. User.tenantId vs TenantMember — dual ownership (line 139)
User carries a direct tenantId FK and there is a TenantMember model which is the documented source of truth for tenant membership. The two can drift out of sync. Any code path that reads User.tenantId for an authorization check silently bypasses the correct membership gate.
User.tenantId looks like a migration artefact from before TenantMember was introduced and was never removed.
Recommendation: Remove User.tenantId and the Tenant.users relation. Enforce all membership checks through TenantMember only.
2. Activity.sourceDeliverableId has no @relation — dangling reference (line 645)
sourceDeliverableId String? is documented as “FK → source Deliverable” but has no @relation directive. Compare with sourceActivityId immediately below it, which correctly declares @relation("RepurposedContent"). The asymmetry means:
- No DB-level referential integrity on
sourceDeliverableId - Deleting a source
Deliverableleaves a dangling ID with no cascade behaviour - Prisma won’t enforce or auto-cascade anything
Recommendation: Either add the missing @relation (with onDelete: SetNull), or explicitly document it as a soft reference like CreditLedger.agentRunId.
3. WebMedia.contentHash @unique is global — cross-tenant collision risk (line 3385)
The SHA-256 hash has a global @unique constraint with no tenantId scope. Two tenants uploading the same image (e.g. a stock photo used by both) would cause a unique constraint violation on the second insert.
Recommendation: Change to @@unique([tenantId, contentHash]) and drop the standalone @unique. Add @@index([tenantId, contentHash]) if needed for lookups.
4. CrawlCache stores personal contact data in a shared, tenant-agnostic table (lines 2296–2310)
CrawlCache is keyed by domain with no tenantId. It stores recipientEmail and recipientName scraped from third-party websites. Tenant A’s crawl populates contact info that is readable by Tenant B’s outreach writer. This is a potential GDPR/privacy violation — personal data collected in the context of one client’s campaign is exposed to all clients.
Recommendation: Either scope CrawlCache with a tenantId (breaking the shared-cache property), or strip personal data from the cache and only store structural metadata (isReachable, hasCaptcha, isBlocked, contactPageUrl). Contact details should be stored in a tenant-scoped table on first use.
5. HelpPageRating and HelpSearchEvent have no FK on tenantId (lines 3242–3265)
Both models store tenantId String with no @relation directive. Deleting a tenant leaves orphaned analytics rows forever. There is also no cascade delete.
Recommendation: Add tenant Tenant @relation(fields: [tenantId], references: [id], onDelete: Cascade) on both models.
🟡 Medium — Design Debt / Consistency
6. Redundant @@index duplicating @@unique on same columns
StrategyVersion and ClientContextVersion both declare:
@@unique([strategyId, version])
@@index([strategyId, version]) // ← redundantIn Postgres, @@unique already creates a B-tree index. The extra @@index is a duplicate that wastes write overhead and disk space on every insert and update.
Recommendation: Remove the @@index wherever @@unique covers identical columns.
7. Subscription.statusChangeLogs is an unbounded JSON array (line 1522)
statusChangeLogs Json @default("[]") is an append-only log stored as a JSON array in a single cell. Problems:
- The entire array must be read and re-serialized on every status change
- Individual entries cannot be indexed or queried
- The array grows indefinitely for long-running subscriptions
The platform already uses proper log tables consistently elsewhere (DeliverablePlanLog, ClientContextLog, StrategyLog).
Recommendation: Create a SubscriptionLog model mirroring the existing log table pattern.
8. Two parallel contact systems with no link between them
Lead has its own LeadContact[] (email / phone / WhatsApp entries). Contact is described as the “universal contact entity” for leads, customers, and outreach prospects. There is no FK between the two systems. A lead captured via a website form doesn’t create a Contact row, so the same person can exist in both tables with no join path.
Recommendation: Either replace LeadContact with a contactId FK on Lead pointing to Contact, or add an explicit sync step at lead creation time and document the relationship.
9. BacklinkHealth has no backlinkId FK — can’t join health to a specific backlink (lines 2712–2733)
BacklinkHealth tracks the live HTTP status of acquired backlinks, but its only FK is to Campaign. There is no direct join from a Backlink row to its health record. You can only retrieve health data for all backlinks in a campaign, not for a specific one.
Recommendation: Add backlinkId String? @unique to BacklinkHealth with backlink Backlink? @relation(fields: [backlinkId], references: [id], onDelete: SetNull).
10. Invoice.remindersSent counter can drift from InvoiceReminder table (lines 1592–1600)
The remindersSent Int @default(0) field on Invoice is a denormalised count. If a reminder insert into InvoiceReminder fails after the counter is already incremented (or vice versa), the two fall out of sync. Correcting this requires a manual reconciliation query.
Recommendation: Either derive the count from SELECT COUNT(*) FROM invoice_reminder WHERE invoiceId = ? at query time, or update the counter transactionally in the same Prisma $transaction that writes the InvoiceReminder row — and document that invariant.
11. Tenant.contextConfig Json? should be a typed model (line 113)
Strategy input config uses a proper typed model (TenantStrategyConfig) with individual boolean columns. Context generation config sits as an opaque contextConfig Json? blob on the Tenant row. Both control the same agent pipeline. The inconsistency means context config cannot be queried, indexed, or migrated field-by-field.
Recommendation: Extract contextConfig into a TenantContextConfig model, mirroring TenantStrategyConfig.
12. Keyword lacks a composite index and duplicate guard (lines 2762–2782)
Only a @@index([tenantId]) exists. The most common operation — looking up a keyword by text for a tenant (for duplicate detection, findFirst+create) — has no [tenantId, keyword] composite index to support it. There is also no @@unique([tenantId, keyword]), so duplicate keyword strings per tenant accumulate silently.
Recommendation: Add @@index([tenantId, keyword]). Evaluate whether a @@unique([tenantId, keyword]) constraint is appropriate given the findFirst+create pattern.
13. FAQ data split across two storage mechanisms (line 866)
Live blog FAQs are stored in the BlogPostFaq table. Versioned FAQ snapshots are stored in BlogPostVersion.faqs Json?. Restoring a blog post to a previous version requires deserialising the JSON snapshot and re-populating BlogPostFaq rows — this restore procedure is neither documented nor enforced anywhere in the schema.
Recommendation: Document the restore procedure explicitly in the schema comments, or unify FAQ storage (either always use the BlogPostFaq table with a version FK, or always use JSON and remove the separate table).
14. AgentRun.tenantName is a denormalized snapshot with no documentation (line 1263)
tenantName String @default("") stores a copy of the tenant’s name at run time. This will go stale if the tenant is renamed. The same pattern is used intentionally in TenantDeletion.tenantName, which documents it clearly. AgentRun has no such comment.
Recommendation: Add a comment: // snapshot of tenant name at run time; not kept in sync with Tenant.name.
15. CreditBalance has no createdAt (lines 3078–3099)
Every other significant model has a createdAt timestamp. The credit balance row is created once per tenant at provisioning time. Without a timestamp, it is impossible to audit when balances were initialised or detect rows that were created incorrectly.
Recommendation: Add createdAt DateTime @default(now()).
🟢 Low / Style
16. Status fields use String everywhere — no DB-level validation
Every status field is a String with valid values described in a comment. Prisma enums would give DB-level validation and reject invalid strings at insert time. This is a deliberate trade-off (avoiding migrations for new status values) but means a typo in application code produces invalid rows silently. For high-stakes fields like Subscription.status and Invoice.status, the cost of a migration is low relative to the risk.
17. RagCrawlJob vs WebCrawlJob — naming overlap for different concepts
RagCrawlJob (line 1797) crawls external websites into Qdrant RAG datasets. WebCrawlJob (line 3282) crawls a tenant’s own website channel via Playwright for the website crawler feature. Both are “crawl jobs” but operate in completely different parts of the system. A developer new to the codebase will routinely confuse them.
Recommendation: Rename RagCrawlJob → RagIngestJob or RagWebCrawlJob to signal its purpose clearly.
18. Campaign.activityId is a soft reference with no explanatory comment (line 2223)
activityId String? has no @relation directive and no comment explaining why (unlike CreditLedger.agentRunId which explicitly states “Soft reference — no FK constraint (runs may be deleted)”). A reader cannot tell whether the missing FK is intentional or an oversight.
Recommendation: Add a comment matching the CreditLedger pattern: // soft reference to Activity — no FK constraint; activity may be deleted independently.
Summary
| # | Severity | Model | Issue |
|---|---|---|---|
| 1 | 🔴 High | User | Dual membership via User.tenantId + TenantMember can drift |
| 2 | 🔴 High | Activity | sourceDeliverableId has no @relation — dangling reference |
| 3 | 🔴 High | WebMedia | contentHash @unique is global — cross-tenant collision risk |
| 4 | 🔴 High | CrawlCache | Personal contact data in shared tenant-agnostic table (GDPR) |
| 5 | 🔴 High | HelpPageRating, HelpSearchEvent | tenantId has no FK / no cascade delete |
| 6 | 🟡 Medium | StrategyVersion, ClientContextVersion | Redundant @@index duplicating @@unique on same columns |
| 7 | 🟡 Medium | Subscription | statusChangeLogs is an unbounded JSON array |
| 8 | 🟡 Medium | Lead / Contact | Two parallel contact systems with no link between them |
| 9 | 🟡 Medium | BacklinkHealth | No backlinkId FK — can’t join health to a specific backlink |
| 10 | 🟡 Medium | Invoice | remindersSent counter can drift from InvoiceReminder rows |
| 11 | 🟡 Medium | Tenant | contextConfig Json? should be a typed model like TenantStrategyConfig |
| 12 | 🟡 Medium | Keyword | Missing [tenantId, keyword] composite index; no duplicate guard |
| 13 | 🟡 Medium | BlogPostVersion | FAQ data split across Json? snapshot and BlogPostFaq table |
| 14 | 🟡 Medium | AgentRun | tenantName snapshot not documented as intentionally denormalized |
| 15 | 🟡 Medium | CreditBalance | Missing createdAt field |
| 16 | 🟢 Low | All status fields | String type offers no DB-level validation vs enums |
| 17 | 🟢 Low | RagCrawlJob | Name too similar to WebCrawlJob for a different concept |
| 18 | 🟢 Low | Campaign | activityId soft reference has no explanatory comment |
Quickest wins (small migrations, high signal)
- #5 — Add
@relation+ cascade delete toHelpPageRatingandHelpSearchEvent - #6 — Remove redundant indexes on
StrategyVersionandClientContextVersion - #3 — Scope
WebMedia.contentHashunique constraint to[tenantId, contentHash] - #15 — Add
createdAttoCreditBalance
Heaviest (architectural, require coordination)
- #1 — Remove
User.tenantId(requires auditing all auth code paths) - #4 — Redesign
CrawlCacheto remove personal data (requires outreach pipeline changes) - #8 — Unify
LeadContactandContact(requires CRM + newsletter pipeline changes)