Skip to Content
AgentsImprovementsGap 11: No Multi-Reviewer Consensus for High-Stakes Documents

Gap 11: No Multi-Reviewer Consensus for High-Stakes Documents

Problem

The approval model is single-reviewer, single-pass for all content types — including high-stakes documents like strategy files and client context that are long-lived and hard to correct later.

  • DM reviews → approves or rejects (one person)
  • Client reviews → approves or rejects (one person)
  • No second opinion, no confidence weighting, no automated cross-check before human review

The article’s Reflexion and Tree of Thoughts frameworks both demonstrate that multiple evaluation passes significantly reduce the rate of low-quality output reaching production. ChemCrow’s evaluation methodology exposed a specific blind spot: LLM self-evaluation underestimates quality differences that human experts can see.

What this costs

For a blog post, a single reviewer missing a factual error or brand voice issue is an acceptable risk — it can be corrected quickly.

For a strategy document that governs 3 months of content planning, a missed error has a much longer blast radius. Similarly for a client context file that informs every downstream agent run.

What to Build

This improvement applies selectively — only to high-stakes documents (strategy, client context). Blog posts and social content are not worth the added complexity.

1. Devil’s advocate critic for strategy/context

Before sending to DM review, run a second LLM pass as a “devil’s advocate” evaluator:

// packages/agents/src/lib/devil-advocate.ts export async function runDevilsAdvocate( output: string, documentType: "strategy" | "client_context", tenantContext: { industry: string; goals: string[] } ): Promise<DevilsAdvocateResult> { const prompt = ` You are a sceptical strategist reviewing a ${documentType} document. Your job is to identify weaknesses, gaps, and assumptions that may not hold. Be constructive but thorough. Evaluate this document on: 1. Internal consistency — do the goals and tactics align? 2. Feasibility — are the timelines and resource assumptions realistic? 3. Completeness — what important topics are missing or underdeveloped? 4. Market assumptions — what claims are made without evidence? For each concern, provide: - severity: "blocking" | "major" | "minor" - section: which part of the document this relates to - concern: what the problem is - suggestion: a specific improvement DOCUMENT: ${output} TENANT CONTEXT: Industry: ${tenantContext.industry} Goals: ${tenantContext.goals.join(", ")} `; const result = await claudeSonnet(prompt); // not haiku — needs reasoning return parseDevilsAdvocate(result); }

2. Surface concerns inline in DM review

When a DM opens a strategy or context document for review, show the devil’s advocate concerns as an expandable panel alongside the document:

[Strategy Document] [AI Review Concerns] ───────────────────── ⚠ MAJOR (2): • Section 3: Timeline assumes blog posts can be published weekly, but no content brief pipeline is in place yet. • Section 5: Competitor gap analysis is based on 3 competitors but misses the dominant player in the e-commerce segment. ℹ MINOR (3): • Section 1: "industry-leading" claim needs evidence...

The DM can dismiss each concern (marking it as “acknowledged” or “not applicable”) or act on it by sending back for revision with the specific concern attached.

3. DevilsAdvocateRun model

model DevilsAdvocateRun { id String @id @default(cuid()) agentRunId String documentType String // "strategy" | "client_context" concerns Json // { severity, section, concern, suggestion }[] blockingCount Int majorCount Int minorCount Int costUsd Float? durationMs Int? createdAt DateTime @default(now()) agentRun AgentRun @relation(fields: [agentRunId], references: [id]) @@map("devils_advocate_run") }

4. Blocking gate on high severity count

If the devil’s advocate finds more than 2 blocking concerns, do not send to DM review. Instead, automatically trigger a regeneration with the blocking concerns injected as revision instructions:

if (advocateResult.blockingCount > 2) { const revisionPrompt = buildRevisionFromConcerns(originalPrompt, output, advocateResult.concerns); const revisedOutput = await adapter.execute({ ...config, prompt: revisionPrompt }); // Run devil's advocate again on the revised output (max 1 retry) }

5. Consensus score for client approval

When a client approves a strategy document, also show the devil’s advocate score (if it was run) as context: “AI review: 0 blocking, 1 major, 3 minor concerns — all acknowledged by DM.” This gives the client a sense of how thoroughly the document was reviewed.

Files to Change

  • New file: packages/agents/src/lib/devil-advocate.ts
  • packages/agents/src/workers/strategy.worker.ts — run devil’s advocate before setting status to dm_review
  • packages/agents/src/workers/setup.worker.ts — same for context-file-writer step
  • packages/db/prisma/schema.prisma — add DevilsAdvocateRun model
  • DM portal strategy review page — surface concerns panel
  • DM portal context review page — same
  • Gap 6: Critic agent quality gate (critic handles content quality; devil’s advocate handles strategic coherence — different concerns, different agents)
  • Gap 5: Structured output contracts (devil’s advocate is more effective operating on structured output than raw Markdown)

© 2026 Leadmetrics — Internal use only