Gap 6: No Quality Gate Before Human Review
Problem
The brand scorer exists and runs post-generation, but it is non-blocking. Content below any quality threshold still flows to DM review regardless of score. DMs spend time reviewing content that a lightweight automated check could have caught and sent back for regeneration.
Current flow:
generate → brand_score (non-blocking, informational only) → dm_reviewThe article’s ChemCrow architecture used expert-calibrated evaluation rubrics as a gating step, not a post-hoc annotation. Reflexion demonstrates that agents improve significantly when they receive automated feedback before human review — not instead of it.
What this costs
- DM reviewer time spent on content that would be immediately rejected
- Rejection cycles that could have been caught in the first automated pass
- No
low_confidencesignal for DMs to know which items need extra scrutiny - Brand scorer result sits on
BlogPost.brandScorebut DMs don’t know the threshold means anything
What to Build
1. Blocking critic pass after generation
After generation and before setting status to dm_review, run a blocking critic evaluation:
// packages/agents/src/lib/critic.ts
export interface CriticResult {
passed: boolean;
score: number; // 0–100
confidence: "high" | "medium" | "low";
issues: CriticIssue[];
recommendation: "approve" | "regenerate" | "flag_for_review";
}
export interface CriticIssue {
severity: "blocking" | "major" | "minor";
category: "brand_voice" | "accuracy" | "structure" | "length" | "seo" | "relevance";
description: string;
suggestion: string;
}
export async function runCritic(
output: string,
agentRole: string,
tenantContext: { brandVoice: string; targetAudience: string; contentBrief?: string }
): Promise<CriticResult> {
// Fast haiku call — cheap and quick
const result = await claudeHaiku(buildCriticPrompt(output, agentRole, tenantContext));
return parseCriticResult(result);
}The critic prompt is a rubric-based evaluation, not open-ended feedback:
You are a quality reviewer for marketing content. Evaluate this content against the rubric below.
Return a JSON object matching the CriticResult schema.
RUBRIC:
- Brand voice alignment (0–25 pts): Does the tone match the brand voice guide?
- Structural completeness (0–20 pts): Are all required sections present?
- Content depth (0–20 pts): Does it provide actionable value?
- SEO basics (0–20 pts): Is there a clear H1, keyword usage, and meta description?
- Readability (0–15 pts): Is it clear, scannable, and free of jargon?
BRAND VOICE: {brandVoice}
TARGET AUDIENCE: {targetAudience}
CONTENT TO EVALUATE:
{output}2. Gated regeneration loop
Replace the current fire-and-forget flow with a gated loop (max 2 retries):
const QUALITY_THRESHOLD = 65; // out of 100
const MAX_CRITIC_RETRIES = 2;
let output = await runAgentGeneration(prompt);
let criticResult = await runCritic(output, agentRole, tenantContext);
let attempts = 0;
while (!criticResult.passed && attempts < MAX_CRITIC_RETRIES) {
const retryPrompt = buildRetryPromptWithCriticFeedback(prompt, output, criticResult);
output = await runAgentGeneration(retryPrompt);
criticResult = await runCritic(output, agentRole, tenantContext);
attempts++;
}
// Always send to dm_review, but flag low confidence
const qualityFlag = criticResult.score < QUALITY_THRESHOLD
? "low_confidence"
: criticResult.score >= 85
? "high_confidence"
: null;3. Retry prompt with critic feedback injected
When regenerating after a critic failure, the retry prompt includes the critic’s specific issues:
Your previous output did not meet quality standards. The reviewer identified these issues:
BLOCKING ISSUES (must fix):
- [issue 1 from critic]
MAJOR ISSUES (should fix):
- [issue 2 from critic]
Previous output for reference (do NOT repeat these mistakes):
---
[previous output excerpt]
---
Now regenerate with these issues fixed.4. DM review UI quality badge
Surface the critic result in the DM portal review page:
- Green badge: “High confidence (score: 87)” — critic passed cleanly
- Yellow badge: “Needs review (score: 62)” — passed threshold but low confidence
- Red badge: “Flagged (score: 41, 2 retries)” — failed threshold, sent after max retries
This gives DMs a prioritised queue: high-confidence items can be batch-approved faster; flagged items get more attention.
5. CriticRun model for analytics
model CriticRun {
id String @id @default(cuid())
agentRunId String
attempt Int // 1 = first pass, 2 = after first retry, etc.
score Int
passed Boolean
issues Json // CriticIssue[]
costUsd Float?
durationMs Int?
createdAt DateTime @default(now())
agentRun AgentRun @relation(fields: [agentRunId], references: [id])
@@map("critic_run")
}This enables analytics: which agents fail the critic most, which issue categories are most common, how much the retry loop improves scores.
Files to Change
- New file:
packages/agents/src/lib/critic.ts packages/agents/src/workers/blog-writer.worker.ts— add gated critic looppackages/agents/src/workers/social-post-writer.worker.ts— samepackages/agents/src/workers/landing-page-writer.worker.ts— samepackages/db/prisma/schema.prisma— addCriticRunmodel, addqualityFlagtoBlogPost- DM portal review pages — surface quality badge
Related
- Gap 5: Structured output contracts (critic operates on structured output for better precision)
- Gap 1: Learning from feedback history (critic scores are a quality signal for episode retrieval)
- Gap 3: Hallucination detection (complementary — critic catches content quality; hallucination detector catches execution failures)