Database Schema Review — v2

Date: 2026-04-29 Reviewer: Claude (schema walk, 122 models, ~3,480 lines) Schema: packages/db/prisma/schema.prisma

Overview

The schema is well-structured overall: strong tenant isolation (nearly every domain model cascades from Tenant), thorough inline lifecycle documentation, consistent audit trail models, and versioning for long-lived content. Financial amounts are correctly stored in smallest currency units. Indexes are generally well-placed on high-cardinality lookups.

The main concerns are correctness issues in a handful of models (missing FK relations, global unique constraints that should be per-tenant, a shared table holding personal data) and design debt accumulated from incremental growth (dual membership system, parallel contact models, JSON-array audit logs).

🔴 High — Correctness / Integrity Risks

1. `User.tenantId` vs `TenantMember` — dual ownership (line 139)

User carries a direct tenantId FK and there is a TenantMember model which is the documented source of truth for tenant membership. The two can drift out of sync. Any code path that reads User.tenantId for an authorization check silently bypasses the correct membership gate.

User.tenantId looks like a migration artefact from before TenantMember was introduced and was never removed.

Recommendation: Remove User.tenantId and the Tenant.users relation. Enforce all membership checks through TenantMember only.

2. `Activity.sourceDeliverableId` has no `@relation` — dangling reference (line 645)

sourceDeliverableId String? is documented as “FK → source Deliverable” but has no @relation directive. Compare with sourceActivityId immediately below it, which correctly declares @relation("RepurposedContent"). The asymmetry means:

No DB-level referential integrity on sourceDeliverableId
Deleting a source Deliverable leaves a dangling ID with no cascade behaviour
Prisma won’t enforce or auto-cascade anything

Recommendation: Either add the missing @relation (with onDelete: SetNull), or explicitly document it as a soft reference like CreditLedger.agentRunId.

3. `WebMedia.contentHash @unique` is global — cross-tenant collision risk (line 3385)

The SHA-256 hash has a global @unique constraint with no tenantId scope. Two tenants uploading the same image (e.g. a stock photo used by both) would cause a unique constraint violation on the second insert.

Recommendation: Change to @@unique([tenantId, contentHash]) and drop the standalone @unique. Add @@index([tenantId, contentHash]) if needed for lookups.

4. `CrawlCache` stores personal contact data in a shared, tenant-agnostic table (lines 2296–2310)

CrawlCache is keyed by domain with no tenantId. It stores recipientEmail and recipientName scraped from third-party websites. Tenant A’s crawl populates contact info that is readable by Tenant B’s outreach writer. This is a potential GDPR/privacy violation — personal data collected in the context of one client’s campaign is exposed to all clients.

Recommendation: Either scope CrawlCache with a tenantId (breaking the shared-cache property), or strip personal data from the cache and only store structural metadata (isReachable, hasCaptcha, isBlocked, contactPageUrl). Contact details should be stored in a tenant-scoped table on first use.

5. `HelpPageRating` and `HelpSearchEvent` have no FK on `tenantId` (lines 3242–3265)

Both models store tenantId String with no @relation directive. Deleting a tenant leaves orphaned analytics rows forever. There is also no cascade delete.

Recommendation: Add tenant Tenant @relation(fields: [tenantId], references: [id], onDelete: Cascade) on both models.

🟡 Medium — Design Debt / Consistency

6. Redundant `@@index` duplicating `@@unique` on same columns

StrategyVersion and ClientContextVersion both declare:


@@unique([strategyId, version])
@@index([strategyId, version])   // ← redundant

In Postgres, @@unique already creates a B-tree index. The extra @@index is a duplicate that wastes write overhead and disk space on every insert and update.

Recommendation: Remove the @@index wherever @@unique covers identical columns.

7. `Subscription.statusChangeLogs` is an unbounded JSON array (line 1522)

statusChangeLogs Json @default("[]") is an append-only log stored as a JSON array in a single cell. Problems:

The entire array must be read and re-serialized on every status change
Individual entries cannot be indexed or queried
The array grows indefinitely for long-running subscriptions

The platform already uses proper log tables consistently elsewhere (DeliverablePlanLog, ClientContextLog, StrategyLog).

Recommendation: Create a SubscriptionLog model mirroring the existing log table pattern.

8. Two parallel contact systems with no link between them

Lead has its own LeadContact[] (email / phone / WhatsApp entries). Contact is described as the “universal contact entity” for leads, customers, and outreach prospects. There is no FK between the two systems. A lead captured via a website form doesn’t create a Contact row, so the same person can exist in both tables with no join path.

Recommendation: Either replace LeadContact with a contactId FK on Lead pointing to Contact, or add an explicit sync step at lead creation time and document the relationship.

9. `BacklinkHealth` has no `backlinkId` FK — can’t join health to a specific backlink (lines 2712–2733)

BacklinkHealth tracks the live HTTP status of acquired backlinks, but its only FK is to Campaign. There is no direct join from a Backlink row to its health record. You can only retrieve health data for all backlinks in a campaign, not for a specific one.

Recommendation: Add backlinkId String? @unique to BacklinkHealth with backlink Backlink? @relation(fields: [backlinkId], references: [id], onDelete: SetNull).

10. `Invoice.remindersSent` counter can drift from `InvoiceReminder` table (lines 1592–1600)

The remindersSent Int @default(0) field on Invoice is a denormalised count. If a reminder insert into InvoiceReminder fails after the counter is already incremented (or vice versa), the two fall out of sync. Correcting this requires a manual reconciliation query.

Recommendation: Either derive the count from SELECT COUNT(*) FROM invoice_reminder WHERE invoiceId = ? at query time, or update the counter transactionally in the same Prisma $transaction that writes the InvoiceReminder row — and document that invariant.

11. `Tenant.contextConfig Json?` should be a typed model (line 113)

Strategy input config uses a proper typed model (TenantStrategyConfig) with individual boolean columns. Context generation config sits as an opaque contextConfig Json? blob on the Tenant row. Both control the same agent pipeline. The inconsistency means context config cannot be queried, indexed, or migrated field-by-field.

Recommendation: Extract contextConfig into a TenantContextConfig model, mirroring TenantStrategyConfig.

12. `Keyword` lacks a composite index and duplicate guard (lines 2762–2782)

Only a @@index([tenantId]) exists. The most common operation — looking up a keyword by text for a tenant (for duplicate detection, findFirst+create) — has no [tenantId, keyword] composite index to support it. There is also no @@unique([tenantId, keyword]), so duplicate keyword strings per tenant accumulate silently.

Recommendation: Add @@index([tenantId, keyword]). Evaluate whether a @@unique([tenantId, keyword]) constraint is appropriate given the findFirst+create pattern.

13. FAQ data split across two storage mechanisms (line 866)

Live blog FAQs are stored in the BlogPostFaq table. Versioned FAQ snapshots are stored in BlogPostVersion.faqs Json?. Restoring a blog post to a previous version requires deserialising the JSON snapshot and re-populating BlogPostFaq rows — this restore procedure is neither documented nor enforced anywhere in the schema.

Recommendation: Document the restore procedure explicitly in the schema comments, or unify FAQ storage (either always use the BlogPostFaq table with a version FK, or always use JSON and remove the separate table).

14. `AgentRun.tenantName` is a denormalized snapshot with no documentation (line 1263)

tenantName String @default("") stores a copy of the tenant’s name at run time. This will go stale if the tenant is renamed. The same pattern is used intentionally in TenantDeletion.tenantName, which documents it clearly. AgentRun has no such comment.

Recommendation: Add a comment: // snapshot of tenant name at run time; not kept in sync with Tenant.name.

15. `CreditBalance` has no `createdAt` (lines 3078–3099)

Every other significant model has a createdAt timestamp. The credit balance row is created once per tenant at provisioning time. Without a timestamp, it is impossible to audit when balances were initialised or detect rows that were created incorrectly.

Recommendation: Add createdAt DateTime @default(now()).

🟢 Low / Style

16. Status fields use `String` everywhere — no DB-level validation

Every status field is a String with valid values described in a comment. Prisma enums would give DB-level validation and reject invalid strings at insert time. This is a deliberate trade-off (avoiding migrations for new status values) but means a typo in application code produces invalid rows silently. For high-stakes fields like Subscription.status and Invoice.status, the cost of a migration is low relative to the risk.

17. `RagCrawlJob` vs `WebCrawlJob` — naming overlap for different concepts

RagCrawlJob (line 1797) crawls external websites into Qdrant RAG datasets. WebCrawlJob (line 3282) crawls a tenant’s own website channel via Playwright for the website crawler feature. Both are “crawl jobs” but operate in completely different parts of the system. A developer new to the codebase will routinely confuse them.

Recommendation: Rename RagCrawlJob → RagIngestJob or RagWebCrawlJob to signal its purpose clearly.

18. `Campaign.activityId` is a soft reference with no explanatory comment (line 2223)

activityId String? has no @relation directive and no comment explaining why (unlike CreditLedger.agentRunId which explicitly states “Soft reference — no FK constraint (runs may be deleted)”). A reader cannot tell whether the missing FK is intentional or an oversight.

Recommendation: Add a comment matching the CreditLedger pattern: // soft reference to Activity — no FK constraint; activity may be deleted independently.

Summary

#	Severity	Model	Issue
1	🔴 High	`User`	Dual membership via `User.tenantId` + `TenantMember` can drift
2	🔴 High	`Activity`	`sourceDeliverableId` has no `@relation` — dangling reference
3	🔴 High	`WebMedia`	`contentHash @unique` is global — cross-tenant collision risk
4	🔴 High	`CrawlCache`	Personal contact data in shared tenant-agnostic table (GDPR)
5	🔴 High	`HelpPageRating`, `HelpSearchEvent`	`tenantId` has no FK / no cascade delete
6	🟡 Medium	`StrategyVersion`, `ClientContextVersion`	Redundant `@@index` duplicating `@@unique` on same columns
7	🟡 Medium	`Subscription`	`statusChangeLogs` is an unbounded JSON array
8	🟡 Medium	`Lead` / `Contact`	Two parallel contact systems with no link between them
9	🟡 Medium	`BacklinkHealth`	No `backlinkId` FK — can’t join health to a specific backlink
10	🟡 Medium	`Invoice`	`remindersSent` counter can drift from `InvoiceReminder` rows
11	🟡 Medium	`Tenant`	`contextConfig Json?` should be a typed model like `TenantStrategyConfig`
12	🟡 Medium	`Keyword`	Missing `[tenantId, keyword]` composite index; no duplicate guard
13	🟡 Medium	`BlogPostVersion`	FAQ data split across `Json?` snapshot and `BlogPostFaq` table
14	🟡 Medium	`AgentRun`	`tenantName` snapshot not documented as intentionally denormalized
15	🟡 Medium	`CreditBalance`	Missing `createdAt` field
16	🟢 Low	All status fields	`String` type offers no DB-level validation vs enums
17	🟢 Low	`RagCrawlJob`	Name too similar to `WebCrawlJob` for a different concept
18	🟢 Low	`Campaign`	`activityId` soft reference has no explanatory comment

Quickest wins (small migrations, high signal)

#5 — Add @relation + cascade delete to HelpPageRating and HelpSearchEvent
#6 — Remove redundant indexes on StrategyVersion and ClientContextVersion
#3 — Scope WebMedia.contentHash unique constraint to [tenantId, contentHash]
#15 — Add createdAt to CreditBalance

Heaviest (architectural, require coordination)

#1 — Remove User.tenantId (requires auditing all auth code paths)
#4 — Redesign CrawlCache to remove personal data (requires outreach pipeline changes)
#8 — Unify LeadContact and Contact (requires CRM + newsletter pipeline changes)

Database Schema Review — v2

Overview

🔴 High — Correctness / Integrity Risks

1. User.tenantId vs TenantMember — dual ownership (line 139)

2. Activity.sourceDeliverableId has no @relation — dangling reference (line 645)

3. WebMedia.contentHash @unique is global — cross-tenant collision risk (line 3385)

4. CrawlCache stores personal contact data in a shared, tenant-agnostic table (lines 2296–2310)

5. HelpPageRating and HelpSearchEvent have no FK on tenantId (lines 3242–3265)

🟡 Medium — Design Debt / Consistency

6. Redundant @@index duplicating @@unique on same columns

7. Subscription.statusChangeLogs is an unbounded JSON array (line 1522)