Skip to Content
AgentsImprovementsGap 1: No Learning from Historical Feedback

Gap 1: No Learning from Historical Feedback

Problem

Every agent run starts cold. The blog-writer that got rejected three times for the same tenant doesn’t know that. The strategy-writer that produced a high-scoring output last month doesn’t carry that pattern forward.

The data needed to learn already exists:

  • AgentRun.output + AgentRun.transcript — every run result and execution trace
  • BlogPost.brandScore / BlogPost.brandIssues — LLM quality scores
  • rejectionFeedback on content — human quality signals from DMs
  • BlogPost.version — how many revisions were needed before approval

None of this is fed back into future prompts for the same agent role and tenant. This is the Chain of Hindsight gap (Lilian Weng, 2023): agents improve over time by conditioning on ranked sequences of past outputs and feedback, not just the current task.

Concrete example

The blog-writer for Tenant A has been rejected twice for being too technical and once for ignoring the brand tone. The fourth run sends the same system prompt with no awareness of this history. The DM will likely reject it again.

What to Build

1. AgentEpisode retrieval layer

Before any agent generates output, fetch the last N runs for (agentRole, tenantId) from AgentRun:

const episodes = await db.agentRun.findMany({ where: { tenantId, agentRole, status: "completed" }, orderBy: { completedAt: "desc" }, take: 5, select: { inputSummary: true, output: true, brandScore: true, // from BlogPost join rejectionFeedback: true, // from BlogPost join durationMs: true, }, });

Inject the top-scoring episode as a positive example and the most-rejected episode (if any) as a negative example into the prompt:

## PAST PERFORMANCE FOR THIS TENANT ### What worked well (approved, high brand score): [excerpt from best episode] ### What was rejected (feedback from reviewer): [rejection reason from worst episode] Incorporate these learnings into the current output.

2. TenantAgentMemory model

Add a Prisma model for structured per-tenant learnings that survive beyond raw run logs:

model TenantAgentMemory { id String @id @default(cuid()) tenantId String agentRole String key String // e.g. "preferred_tone", "rejected_topics", "avg_approved_length" value Json updatedAt DateTime @updatedAt @@unique([tenantId, agentRole, key]) @@map("tenant_agent_memory") }

After each completed + approved run, extract structured learnings with a lightweight haiku call:

Given this approved output and its quality score, extract: - preferred writing tone - content length (words) - structural patterns used Return as JSON.

Store results in TenantAgentMemory. Inject top-5 relevant memories into the next run.

3. Negative example injection on revision

Already partially done via rejectionFeedback. Strengthen it: when wakeReason === "rejection", include the previous output as a clearly labelled negative example, not just the feedback text. The model learns more from seeing what to avoid than from being told abstractly.

Files to Change

  • packages/agents/src/workers/blog-writer.worker.ts — inject episode context before LLM call
  • packages/agents/src/workers/setup.worker.ts — same for context-file-writer
  • packages/db/prisma/schema.prisma — add TenantAgentMemory model
  • New file: packages/agents/src/lib/episode-retrieval.ts — shared retrieval + injection logic
  • Gap 9: Episodic memory across related tasks (broader memory architecture)
  • Gap 6: Critic agent quality gate (downstream consumer of quality signals)

© 2026 Leadmetrics — Internal use only