Gap 11: No Multi-Reviewer Consensus for High-Stakes Documents

Problem

The approval model is single-reviewer, single-pass for all content types — including high-stakes documents like strategy files and client context that are long-lived and hard to correct later.

DM reviews → approves or rejects (one person)
Client reviews → approves or rejects (one person)
No second opinion, no confidence weighting, no automated cross-check before human review

The article’s Reflexion and Tree of Thoughts frameworks both demonstrate that multiple evaluation passes significantly reduce the rate of low-quality output reaching production. ChemCrow’s evaluation methodology exposed a specific blind spot: LLM self-evaluation underestimates quality differences that human experts can see.

What this costs

For a blog post, a single reviewer missing a factual error or brand voice issue is an acceptable risk — it can be corrected quickly.

For a strategy document that governs 3 months of content planning, a missed error has a much longer blast radius. Similarly for a client context file that informs every downstream agent run.

What to Build

This improvement applies selectively — only to high-stakes documents (strategy, client context). Blog posts and social content are not worth the added complexity.

1. Devil’s advocate critic for strategy/context

Before sending to DM review, run a second LLM pass as a “devil’s advocate” evaluator:


// packages/agents/src/lib/devil-advocate.ts
 
export async function runDevilsAdvocate(
  output: string,
  documentType: "strategy" | "client_context",
  tenantContext: { industry: string; goals: string[] }
): Promise<DevilsAdvocateResult> {
  const prompt = `
You are a sceptical strategist reviewing a ${documentType} document.
Your job is to identify weaknesses, gaps, and assumptions that may not hold.
Be constructive but thorough.
 
Evaluate this document on:
1. Internal consistency — do the goals and tactics align?
2. Feasibility — are the timelines and resource assumptions realistic?
3. Completeness — what important topics are missing or underdeveloped?
4. Market assumptions — what claims are made without evidence?
 
For each concern, provide:
- severity: "blocking" | "major" | "minor"
- section: which part of the document this relates to
- concern: what the problem is
- suggestion: a specific improvement
 
DOCUMENT:
${output}
 
TENANT CONTEXT:
Industry: ${tenantContext.industry}
Goals: ${tenantContext.goals.join(", ")}
`;
 
  const result = await claudeSonnet(prompt); // not haiku — needs reasoning
  return parseDevilsAdvocate(result);
}

2. Surface concerns inline in DM review

When a DM opens a strategy or context document for review, show the devil’s advocate concerns as an expandable panel alongside the document:


[Strategy Document]                    [AI Review Concerns]
                                       ─────────────────────
                                       ⚠ MAJOR (2):
                                       • Section 3: Timeline assumes 
                                         blog posts can be published 
                                         weekly, but no content brief 
                                         pipeline is in place yet.
                                       • Section 5: Competitor gap 
                                         analysis is based on 3 
                                         competitors but misses the 
                                         dominant player in the 
                                         e-commerce segment.
                                       
                                       ℹ MINOR (3):
                                       • Section 1: "industry-leading" 
                                         claim needs evidence...

The DM can dismiss each concern (marking it as “acknowledged” or “not applicable”) or act on it by sending back for revision with the specific concern attached.

3. DevilsAdvocateRun model


model DevilsAdvocateRun {
  id            String   @id @default(cuid())
  agentRunId    String
  documentType  String   // "strategy" | "client_context"
  concerns      Json     // { severity, section, concern, suggestion }[]
  blockingCount Int
  majorCount    Int
  minorCount    Int
  costUsd       Float?
  durationMs    Int?
  createdAt     DateTime @default(now())
 
  agentRun      AgentRun @relation(fields: [agentRunId], references: [id])
  @@map("devils_advocate_run")
}

4. Blocking gate on high severity count

If the devil’s advocate finds more than 2 blocking concerns, do not send to DM review. Instead, automatically trigger a regeneration with the blocking concerns injected as revision instructions:


if (advocateResult.blockingCount > 2) {
  const revisionPrompt = buildRevisionFromConcerns(originalPrompt, output, advocateResult.concerns);
  const revisedOutput = await adapter.execute({ ...config, prompt: revisionPrompt });
  // Run devil's advocate again on the revised output (max 1 retry)
}

5. Consensus score for client approval

When a client approves a strategy document, also show the devil’s advocate score (if it was run) as context: “AI review: 0 blocking, 1 major, 3 minor concerns — all acknowledged by DM.” This gives the client a sense of how thoroughly the document was reviewed.

Files to Change

New file: packages/agents/src/lib/devil-advocate.ts
packages/agents/src/workers/strategy.worker.ts — run devil’s advocate before setting status to dm_review
packages/agents/src/workers/setup.worker.ts — same for context-file-writer step
packages/db/prisma/schema.prisma — add DevilsAdvocateRun model
DM portal strategy review page — surface concerns panel
DM portal context review page — same

Gap 6: Critic agent quality gate (critic handles content quality; devil’s advocate handles strategic coherence — different concerns, different agents)
Gap 5: Structured output contracts (devil’s advocate is more effective operating on structured output than raw Markdown)