Gap 5: No Structured Output Contracts

Problem

All agent outputs are free-text. Parsing happens ad hoc with regex and loose string checks. Lilian Weng’s 2023 survey explicitly names this as a known reliability risk: “much of the agent demo code focuses on parsing model output” — and it manifests as a real maintenance burden.

Current state in the codebase:

blog-writer extracts title, meta description, and slug from unstructured Markdown using regex
context-file-writer validates output with validateContextOutput() — checks heading count (≥3) and character length (≥500) but not structure
strategy-writer output is unvalidated Markdown stored directly in DB
AgentRun.output is a Text field with no schema at all
Parsing failures silently produce malformed content that reaches human reviewers

Concrete example

The blog-writer prompt says “Return the title on the first line prefixed with TITLE:”. Occasionally the model writes Title: (lowercase t) or **TITLE:** (markdown bold) or omits it entirely. The regex fails silently and blogPost.title is stored as null or as a fragment of the body text.

What to Build

1. Define Zod output schemas per agent role


// packages/agents/src/schemas/outputs.ts
 
import { z } from "zod";
 
export const BlogWriterOutput = z.object({
  title: z.string().min(10).max(120),
  metaDescription: z.string().min(50).max(160),
  slug: z.string().regex(/^[a-z0-9-]+$/),
  content: z.string().min(500),
  wordCount: z.number().int().positive(),
  headings: z.array(z.string()),
  internalLinks: z.array(z.string().url()).optional(),
});
 
export const ContextFileOutput = z.object({
  companyOverview: z.string().min(100),
  targetAudience: z.string().min(50),
  coreProducts: z.array(z.string()).min(1),
  competitors: z.array(z.string()),
  brandVoice: z.string().min(50),
  keyMessages: z.array(z.string()).min(3),
});
 
export const StrategyOutput = z.object({
  executiveSummary: z.string().min(100),
  goals: z.array(z.object({
    goal: z.string(),
    metric: z.string(),
    timeline: z.string(),
  })).min(1),
  channels: z.array(z.string()).min(1),
  contentPillars: z.array(z.string()).min(3),
  keyInitiatives: z.array(z.object({
    title: z.string(),
    description: z.string(),
    priority: z.enum(["high", "medium", "low"]),
  })).min(3),
});
 
export type BlogWriterOutput = z.infer<typeof BlogWriterOutput>;
export type ContextFileOutput = z.infer<typeof ContextFileOutput>;
export type StrategyOutput = z.infer<typeof StrategyOutput>;

2. Use Claude’s tool_use mode for structured extraction

Instead of asking Claude to format output a certain way and parsing it, use tool_use to enforce structure at the model level:


// packages/agents/src/lib/structured-output.ts
 
export async function extractStructuredOutput<T>(
  rawOutput: string,
  schema: z.ZodType<T>,
  agentRole: string
): Promise<{ success: true; data: T } | { success: false; error: string }> {
  // First try: parse if the model already returned JSON
  try {
    const parsed = JSON.parse(rawOutput);
    const result = schema.safeParse(parsed);
    if (result.success) return { success: true, data: result.data };
  } catch {}
 
  // Second try: extract with a fast haiku call using tool_use
  const extractionResult = await claudeExtract(rawOutput, schema, agentRole);
  return extractionResult;
}

The prompt to haiku for extraction:


Extract the structured fields from this agent output.
Return them exactly matching the required schema.
Do not invent content — only extract what is present.

[RAW OUTPUT]
{rawOutput}

3. Store parsed and raw output separately in AgentRun


model AgentRun {
  // existing
  output         String?  // raw LLM text output
  // new
  parsedOutput   Json?    // validated structured output
  outputValid    Boolean  @default(false)
  outputErrors   String?  // Zod validation error message if parsing failed
}

4. Fail loudly on parse failure for critical agents

For context-file-writer and strategy-writer (high-stakes, human-reviewed), a parse failure should not silently produce bad output. Instead:


const parsed = await extractStructuredOutput(result.output, ContextFileOutput, "context-file-writer");
 
if (!parsed.success) {
  // Retry once with explicit format instructions injected
  const retryResult = await adapter.execute({ ...config, prompt: buildRetryPrompt(result.output, parsed.error) });
  const retryParsed = await extractStructuredOutput(retryResult.output, ContextFileOutput, "context-file-writer");
 
  if (!retryParsed.success) {
    // Store for human review with parse_failed flag
    await db.agentRun.update({ where: { id }, data: { outputValid: false, outputErrors: retryParsed.error } });
    throw new Error(`Output parsing failed after retry: ${retryParsed.error}`);
  }
}

Files to Change

New file: packages/agents/src/schemas/outputs.ts — all output schemas
New file: packages/agents/src/lib/structured-output.ts — extraction utility
packages/db/prisma/schema.prisma — add parsedOutput, outputValid, outputErrors to AgentRun
packages/agents/src/workers/blog-writer.worker.ts — replace regex parsing
packages/agents/src/workers/setup.worker.ts — replace validateContextOutput() with schema validation
packages/agents/src/workers/strategy.worker.ts — add schema validation

Gap 6: Critic agent quality gate (critic operates on parsed structured output, not raw text)
Gap 3: Hallucination detection (structured output failures are a hallucination signal)