Gap 6: No Quality Gate Before Human Review

Problem

The brand scorer exists and runs post-generation, but it is non-blocking. Content below any quality threshold still flows to DM review regardless of score. DMs spend time reviewing content that a lightweight automated check could have caught and sent back for regeneration.

Current flow:


generate → brand_score (non-blocking, informational only) → dm_review

The article’s ChemCrow architecture used expert-calibrated evaluation rubrics as a gating step, not a post-hoc annotation. Reflexion demonstrates that agents improve significantly when they receive automated feedback before human review — not instead of it.

What this costs

DM reviewer time spent on content that would be immediately rejected
Rejection cycles that could have been caught in the first automated pass
No low_confidence signal for DMs to know which items need extra scrutiny
Brand scorer result sits on BlogPost.brandScore but DMs don’t know the threshold means anything

What to Build

1. Blocking critic pass after generation

After generation and before setting status to dm_review, run a blocking critic evaluation:


// packages/agents/src/lib/critic.ts
 
export interface CriticResult {
  passed: boolean;
  score: number;          // 0–100
  confidence: "high" | "medium" | "low";
  issues: CriticIssue[];
  recommendation: "approve" | "regenerate" | "flag_for_review";
}
 
export interface CriticIssue {
  severity: "blocking" | "major" | "minor";
  category: "brand_voice" | "accuracy" | "structure" | "length" | "seo" | "relevance";
  description: string;
  suggestion: string;
}
 
export async function runCritic(
  output: string,
  agentRole: string,
  tenantContext: { brandVoice: string; targetAudience: string; contentBrief?: string }
): Promise<CriticResult> {
  // Fast haiku call — cheap and quick
  const result = await claudeHaiku(buildCriticPrompt(output, agentRole, tenantContext));
  return parseCriticResult(result);
}

The critic prompt is a rubric-based evaluation, not open-ended feedback:


You are a quality reviewer for marketing content. Evaluate this content against the rubric below.
Return a JSON object matching the CriticResult schema.

RUBRIC:
- Brand voice alignment (0–25 pts): Does the tone match the brand voice guide?
- Structural completeness (0–20 pts): Are all required sections present?
- Content depth (0–20 pts): Does it provide actionable value?
- SEO basics (0–20 pts): Is there a clear H1, keyword usage, and meta description?
- Readability (0–15 pts): Is it clear, scannable, and free of jargon?

BRAND VOICE: {brandVoice}
TARGET AUDIENCE: {targetAudience}

CONTENT TO EVALUATE:
{output}

2. Gated regeneration loop

Replace the current fire-and-forget flow with a gated loop (max 2 retries):


const QUALITY_THRESHOLD = 65; // out of 100
const MAX_CRITIC_RETRIES = 2;
 
let output = await runAgentGeneration(prompt);
let criticResult = await runCritic(output, agentRole, tenantContext);
let attempts = 0;
 
while (!criticResult.passed && attempts < MAX_CRITIC_RETRIES) {
  const retryPrompt = buildRetryPromptWithCriticFeedback(prompt, output, criticResult);
  output = await runAgentGeneration(retryPrompt);
  criticResult = await runCritic(output, agentRole, tenantContext);
  attempts++;
}
 
// Always send to dm_review, but flag low confidence
const qualityFlag = criticResult.score < QUALITY_THRESHOLD
  ? "low_confidence"
  : criticResult.score >= 85
  ? "high_confidence"
  : null;

3. Retry prompt with critic feedback injected

When regenerating after a critic failure, the retry prompt includes the critic’s specific issues:


Your previous output did not meet quality standards. The reviewer identified these issues:

BLOCKING ISSUES (must fix):
- [issue 1 from critic]

MAJOR ISSUES (should fix):
- [issue 2 from critic]

Previous output for reference (do NOT repeat these mistakes):
---
[previous output excerpt]
---

Now regenerate with these issues fixed.

4. DM review UI quality badge

Surface the critic result in the DM portal review page:

Green badge: “High confidence (score: 87)” — critic passed cleanly
Yellow badge: “Needs review (score: 62)” — passed threshold but low confidence
Red badge: “Flagged (score: 41, 2 retries)” — failed threshold, sent after max retries

This gives DMs a prioritised queue: high-confidence items can be batch-approved faster; flagged items get more attention.

5. CriticRun model for analytics


model CriticRun {
  id          String   @id @default(cuid())
  agentRunId  String
  attempt     Int      // 1 = first pass, 2 = after first retry, etc.
  score       Int
  passed      Boolean
  issues      Json     // CriticIssue[]
  costUsd     Float?
  durationMs  Int?
  createdAt   DateTime @default(now())
 
  agentRun    AgentRun @relation(fields: [agentRunId], references: [id])
  @@map("critic_run")
}

This enables analytics: which agents fail the critic most, which issue categories are most common, how much the retry loop improves scores.

Files to Change

New file: packages/agents/src/lib/critic.ts
packages/agents/src/workers/blog-writer.worker.ts — add gated critic loop
packages/agents/src/workers/social-post-writer.worker.ts — same
packages/agents/src/workers/landing-page-writer.worker.ts — same
packages/db/prisma/schema.prisma — add CriticRun model, add qualityFlag to BlogPost
DM portal review pages — surface quality badge

Gap 5: Structured output contracts (critic operates on structured output for better precision)
Gap 1: Learning from feedback history (critic scores are a quality signal for episode retrieval)
Gap 3: Hallucination detection (complementary — critic catches content quality; hallucination detector catches execution failures)