Gap 11: No Multi-Reviewer Consensus for High-Stakes Documents
Problem
The approval model is single-reviewer, single-pass for all content types — including high-stakes documents like strategy files and client context that are long-lived and hard to correct later.
- DM reviews → approves or rejects (one person)
- Client reviews → approves or rejects (one person)
- No second opinion, no confidence weighting, no automated cross-check before human review
The article’s Reflexion and Tree of Thoughts frameworks both demonstrate that multiple evaluation passes significantly reduce the rate of low-quality output reaching production. ChemCrow’s evaluation methodology exposed a specific blind spot: LLM self-evaluation underestimates quality differences that human experts can see.
What this costs
For a blog post, a single reviewer missing a factual error or brand voice issue is an acceptable risk — it can be corrected quickly.
For a strategy document that governs 3 months of content planning, a missed error has a much longer blast radius. Similarly for a client context file that informs every downstream agent run.
What to Build
This improvement applies selectively — only to high-stakes documents (strategy, client context). Blog posts and social content are not worth the added complexity.
1. Devil’s advocate critic for strategy/context
Before sending to DM review, run a second LLM pass as a “devil’s advocate” evaluator:
// packages/agents/src/lib/devil-advocate.ts
export async function runDevilsAdvocate(
output: string,
documentType: "strategy" | "client_context",
tenantContext: { industry: string; goals: string[] }
): Promise<DevilsAdvocateResult> {
const prompt = `
You are a sceptical strategist reviewing a ${documentType} document.
Your job is to identify weaknesses, gaps, and assumptions that may not hold.
Be constructive but thorough.
Evaluate this document on:
1. Internal consistency — do the goals and tactics align?
2. Feasibility — are the timelines and resource assumptions realistic?
3. Completeness — what important topics are missing or underdeveloped?
4. Market assumptions — what claims are made without evidence?
For each concern, provide:
- severity: "blocking" | "major" | "minor"
- section: which part of the document this relates to
- concern: what the problem is
- suggestion: a specific improvement
DOCUMENT:
${output}
TENANT CONTEXT:
Industry: ${tenantContext.industry}
Goals: ${tenantContext.goals.join(", ")}
`;
const result = await claudeSonnet(prompt); // not haiku — needs reasoning
return parseDevilsAdvocate(result);
}2. Surface concerns inline in DM review
When a DM opens a strategy or context document for review, show the devil’s advocate concerns as an expandable panel alongside the document:
[Strategy Document] [AI Review Concerns]
─────────────────────
⚠ MAJOR (2):
• Section 3: Timeline assumes
blog posts can be published
weekly, but no content brief
pipeline is in place yet.
• Section 5: Competitor gap
analysis is based on 3
competitors but misses the
dominant player in the
e-commerce segment.
ℹ MINOR (3):
• Section 1: "industry-leading"
claim needs evidence...The DM can dismiss each concern (marking it as “acknowledged” or “not applicable”) or act on it by sending back for revision with the specific concern attached.
3. DevilsAdvocateRun model
model DevilsAdvocateRun {
id String @id @default(cuid())
agentRunId String
documentType String // "strategy" | "client_context"
concerns Json // { severity, section, concern, suggestion }[]
blockingCount Int
majorCount Int
minorCount Int
costUsd Float?
durationMs Int?
createdAt DateTime @default(now())
agentRun AgentRun @relation(fields: [agentRunId], references: [id])
@@map("devils_advocate_run")
}4. Blocking gate on high severity count
If the devil’s advocate finds more than 2 blocking concerns, do not send to DM review. Instead, automatically trigger a regeneration with the blocking concerns injected as revision instructions:
if (advocateResult.blockingCount > 2) {
const revisionPrompt = buildRevisionFromConcerns(originalPrompt, output, advocateResult.concerns);
const revisedOutput = await adapter.execute({ ...config, prompt: revisionPrompt });
// Run devil's advocate again on the revised output (max 1 retry)
}5. Consensus score for client approval
When a client approves a strategy document, also show the devil’s advocate score (if it was run) as context: “AI review: 0 blocking, 1 major, 3 minor concerns — all acknowledged by DM.” This gives the client a sense of how thoroughly the document was reviewed.
Files to Change
- New file:
packages/agents/src/lib/devil-advocate.ts packages/agents/src/workers/strategy.worker.ts— run devil’s advocate before setting status todm_reviewpackages/agents/src/workers/setup.worker.ts— same for context-file-writer steppackages/db/prisma/schema.prisma— addDevilsAdvocateRunmodel- DM portal strategy review page — surface concerns panel
- DM portal context review page — same
Related
- Gap 6: Critic agent quality gate (critic handles content quality; devil’s advocate handles strategic coherence — different concerns, different agents)
- Gap 5: Structured output contracts (devil’s advocate is more effective operating on structured output than raw Markdown)