Skip to Content
AgentsImprovementsGap 6: No Quality Gate Before Human Review

Gap 6: No Quality Gate Before Human Review

Problem

The brand scorer exists and runs post-generation, but it is non-blocking. Content below any quality threshold still flows to DM review regardless of score. DMs spend time reviewing content that a lightweight automated check could have caught and sent back for regeneration.

Current flow:

generate → brand_score (non-blocking, informational only) → dm_review

The article’s ChemCrow architecture used expert-calibrated evaluation rubrics as a gating step, not a post-hoc annotation. Reflexion demonstrates that agents improve significantly when they receive automated feedback before human review — not instead of it.

What this costs

  • DM reviewer time spent on content that would be immediately rejected
  • Rejection cycles that could have been caught in the first automated pass
  • No low_confidence signal for DMs to know which items need extra scrutiny
  • Brand scorer result sits on BlogPost.brandScore but DMs don’t know the threshold means anything

What to Build

1. Blocking critic pass after generation

After generation and before setting status to dm_review, run a blocking critic evaluation:

// packages/agents/src/lib/critic.ts export interface CriticResult { passed: boolean; score: number; // 0–100 confidence: "high" | "medium" | "low"; issues: CriticIssue[]; recommendation: "approve" | "regenerate" | "flag_for_review"; } export interface CriticIssue { severity: "blocking" | "major" | "minor"; category: "brand_voice" | "accuracy" | "structure" | "length" | "seo" | "relevance"; description: string; suggestion: string; } export async function runCritic( output: string, agentRole: string, tenantContext: { brandVoice: string; targetAudience: string; contentBrief?: string } ): Promise<CriticResult> { // Fast haiku call — cheap and quick const result = await claudeHaiku(buildCriticPrompt(output, agentRole, tenantContext)); return parseCriticResult(result); }

The critic prompt is a rubric-based evaluation, not open-ended feedback:

You are a quality reviewer for marketing content. Evaluate this content against the rubric below. Return a JSON object matching the CriticResult schema. RUBRIC: - Brand voice alignment (0–25 pts): Does the tone match the brand voice guide? - Structural completeness (0–20 pts): Are all required sections present? - Content depth (0–20 pts): Does it provide actionable value? - SEO basics (0–20 pts): Is there a clear H1, keyword usage, and meta description? - Readability (0–15 pts): Is it clear, scannable, and free of jargon? BRAND VOICE: {brandVoice} TARGET AUDIENCE: {targetAudience} CONTENT TO EVALUATE: {output}

2. Gated regeneration loop

Replace the current fire-and-forget flow with a gated loop (max 2 retries):

const QUALITY_THRESHOLD = 65; // out of 100 const MAX_CRITIC_RETRIES = 2; let output = await runAgentGeneration(prompt); let criticResult = await runCritic(output, agentRole, tenantContext); let attempts = 0; while (!criticResult.passed && attempts < MAX_CRITIC_RETRIES) { const retryPrompt = buildRetryPromptWithCriticFeedback(prompt, output, criticResult); output = await runAgentGeneration(retryPrompt); criticResult = await runCritic(output, agentRole, tenantContext); attempts++; } // Always send to dm_review, but flag low confidence const qualityFlag = criticResult.score < QUALITY_THRESHOLD ? "low_confidence" : criticResult.score >= 85 ? "high_confidence" : null;

3. Retry prompt with critic feedback injected

When regenerating after a critic failure, the retry prompt includes the critic’s specific issues:

Your previous output did not meet quality standards. The reviewer identified these issues: BLOCKING ISSUES (must fix): - [issue 1 from critic] MAJOR ISSUES (should fix): - [issue 2 from critic] Previous output for reference (do NOT repeat these mistakes): --- [previous output excerpt] --- Now regenerate with these issues fixed.

4. DM review UI quality badge

Surface the critic result in the DM portal review page:

  • Green badge: “High confidence (score: 87)” — critic passed cleanly
  • Yellow badge: “Needs review (score: 62)” — passed threshold but low confidence
  • Red badge: “Flagged (score: 41, 2 retries)” — failed threshold, sent after max retries

This gives DMs a prioritised queue: high-confidence items can be batch-approved faster; flagged items get more attention.

5. CriticRun model for analytics

model CriticRun { id String @id @default(cuid()) agentRunId String attempt Int // 1 = first pass, 2 = after first retry, etc. score Int passed Boolean issues Json // CriticIssue[] costUsd Float? durationMs Int? createdAt DateTime @default(now()) agentRun AgentRun @relation(fields: [agentRunId], references: [id]) @@map("critic_run") }

This enables analytics: which agents fail the critic most, which issue categories are most common, how much the retry loop improves scores.

Files to Change

  • New file: packages/agents/src/lib/critic.ts
  • packages/agents/src/workers/blog-writer.worker.ts — add gated critic loop
  • packages/agents/src/workers/social-post-writer.worker.ts — same
  • packages/agents/src/workers/landing-page-writer.worker.ts — same
  • packages/db/prisma/schema.prisma — add CriticRun model, add qualityFlag to BlogPost
  • DM portal review pages — surface quality badge
  • Gap 5: Structured output contracts (critic operates on structured output for better precision)
  • Gap 1: Learning from feedback history (critic scores are a quality signal for episode retrieval)
  • Gap 3: Hallucination detection (complementary — critic catches content quality; hallucination detector catches execution failures)

© 2026 Leadmetrics — Internal use only