Gap 1: No Learning from Historical Feedback
Problem
Every agent run starts cold. The blog-writer that got rejected three times for the same tenant doesn’t know that. The strategy-writer that produced a high-scoring output last month doesn’t carry that pattern forward.
The data needed to learn already exists:
AgentRun.output+AgentRun.transcript— every run result and execution traceBlogPost.brandScore/BlogPost.brandIssues— LLM quality scoresrejectionFeedbackon content — human quality signals from DMsBlogPost.version— how many revisions were needed before approval
None of this is fed back into future prompts for the same agent role and tenant. This is the Chain of Hindsight gap (Lilian Weng, 2023): agents improve over time by conditioning on ranked sequences of past outputs and feedback, not just the current task.
Concrete example
The blog-writer for Tenant A has been rejected twice for being too technical and once for ignoring the brand tone. The fourth run sends the same system prompt with no awareness of this history. The DM will likely reject it again.
What to Build
1. AgentEpisode retrieval layer
Before any agent generates output, fetch the last N runs for (agentRole, tenantId) from AgentRun:
const episodes = await db.agentRun.findMany({
where: { tenantId, agentRole, status: "completed" },
orderBy: { completedAt: "desc" },
take: 5,
select: {
inputSummary: true,
output: true,
brandScore: true, // from BlogPost join
rejectionFeedback: true, // from BlogPost join
durationMs: true,
},
});Inject the top-scoring episode as a positive example and the most-rejected episode (if any) as a negative example into the prompt:
## PAST PERFORMANCE FOR THIS TENANT
### What worked well (approved, high brand score):
[excerpt from best episode]
### What was rejected (feedback from reviewer):
[rejection reason from worst episode]
Incorporate these learnings into the current output.2. TenantAgentMemory model
Add a Prisma model for structured per-tenant learnings that survive beyond raw run logs:
model TenantAgentMemory {
id String @id @default(cuid())
tenantId String
agentRole String
key String // e.g. "preferred_tone", "rejected_topics", "avg_approved_length"
value Json
updatedAt DateTime @updatedAt
@@unique([tenantId, agentRole, key])
@@map("tenant_agent_memory")
}After each completed + approved run, extract structured learnings with a lightweight haiku call:
Given this approved output and its quality score, extract:
- preferred writing tone
- content length (words)
- structural patterns used
Return as JSON.Store results in TenantAgentMemory. Inject top-5 relevant memories into the next run.
3. Negative example injection on revision
Already partially done via rejectionFeedback. Strengthen it: when wakeReason === "rejection", include the previous output as a clearly labelled negative example, not just the feedback text. The model learns more from seeing what to avoid than from being told abstractly.
Files to Change
packages/agents/src/workers/blog-writer.worker.ts— inject episode context before LLM callpackages/agents/src/workers/setup.worker.ts— same for context-file-writerpackages/db/prisma/schema.prisma— addTenantAgentMemorymodel- New file:
packages/agents/src/lib/episode-retrieval.ts— shared retrieval + injection logic
Related
- Gap 9: Episodic memory across related tasks (broader memory architecture)
- Gap 6: Critic agent quality gate (downstream consumer of quality signals)