Gap 1: No Learning from Historical Feedback

Problem

Every agent run starts cold. The blog-writer that got rejected three times for the same tenant doesn’t know that. The strategy-writer that produced a high-scoring output last month doesn’t carry that pattern forward.

The data needed to learn already exists:

AgentRun.output + AgentRun.transcript — every run result and execution trace
BlogPost.brandScore / BlogPost.brandIssues — LLM quality scores
rejectionFeedback on content — human quality signals from DMs
BlogPost.version — how many revisions were needed before approval

None of this is fed back into future prompts for the same agent role and tenant. This is the Chain of Hindsight gap (Lilian Weng, 2023): agents improve over time by conditioning on ranked sequences of past outputs and feedback, not just the current task.

Concrete example

The blog-writer for Tenant A has been rejected twice for being too technical and once for ignoring the brand tone. The fourth run sends the same system prompt with no awareness of this history. The DM will likely reject it again.

What to Build

1. AgentEpisode retrieval layer

Before any agent generates output, fetch the last N runs for (agentRole, tenantId) from AgentRun:


const episodes = await db.agentRun.findMany({
  where: { tenantId, agentRole, status: "completed" },
  orderBy: { completedAt: "desc" },
  take: 5,
  select: {
    inputSummary: true,
    output: true,
    brandScore: true,        // from BlogPost join
    rejectionFeedback: true, // from BlogPost join
    durationMs: true,
  },
});

Inject the top-scoring episode as a positive example and the most-rejected episode (if any) as a negative example into the prompt:


## PAST PERFORMANCE FOR THIS TENANT

### What worked well (approved, high brand score):
[excerpt from best episode]

### What was rejected (feedback from reviewer):
[rejection reason from worst episode]

Incorporate these learnings into the current output.

2. TenantAgentMemory model

Add a Prisma model for structured per-tenant learnings that survive beyond raw run logs:


model TenantAgentMemory {
  id          String   @id @default(cuid())
  tenantId    String
  agentRole   String
  key         String   // e.g. "preferred_tone", "rejected_topics", "avg_approved_length"
  value       Json
  updatedAt   DateTime @updatedAt
 
  @@unique([tenantId, agentRole, key])
  @@map("tenant_agent_memory")
}

After each completed + approved run, extract structured learnings with a lightweight haiku call:


Given this approved output and its quality score, extract:
- preferred writing tone
- content length (words)
- structural patterns used
Return as JSON.

Store results in TenantAgentMemory. Inject top-5 relevant memories into the next run.

3. Negative example injection on revision

Already partially done via rejectionFeedback. Strengthen it: when wakeReason === "rejection", include the previous output as a clearly labelled negative example, not just the feedback text. The model learns more from seeing what to avoid than from being told abstractly.

Files to Change

packages/agents/src/workers/blog-writer.worker.ts — inject episode context before LLM call
packages/agents/src/workers/setup.worker.ts — same for context-file-writer
packages/db/prisma/schema.prisma — add TenantAgentMemory model
New file: packages/agents/src/lib/episode-retrieval.ts — shared retrieval + injection logic

Gap 9: Episodic memory across related tasks (broader memory architecture)
Gap 6: Critic agent quality gate (downstream consumer of quality signals)