Gap 9: No Episodic Memory Across Related Tasks

Problem

Each agent run reads the same static ClientContext file. There is no accumulated knowledge of what has happened across runs for the same tenant. The blog-writer that just produced 10 posts for a tenant doesn’t know:

Which topics have already been covered (risk of duplicate content)
What writing style got approved vs. rejected
What content lengths the client tends to approve
Which keywords have already been used heavily

This is the core insight from the Generative Agents paper (Park et al., 2023): agents with a memory stream that accumulates experience and synthesises it into higher-level reflections perform significantly better than agents that start from scratch each time.

Concrete example

The blog-writer for Tenant A generates “10 Benefits of Social Media Marketing” in March. In April, without episodic memory, it generates “Why Social Media Marketing Matters for Your Business” — a near-duplicate. Both posts exist in the CMS, both go through DM review, and the DM catches the duplication manually.

What to Build

1. TenantAgentMemory model

A structured key-value store per (tenantId, agentRole) for accumulated learnings:


model TenantAgentMemory {
  id          String   @id @default(cuid())
  tenantId    String
  agentRole   String
  key         String   // e.g. "covered_topics", "preferred_tone", "avg_approved_length"
  value       Json     // type depends on key
  confidence  Float    @default(1.0)  // degrades over time if contradicted
  updatedAt   DateTime @updatedAt
  createdAt   DateTime @default(now())
 
  tenant      Tenant   @relation(fields: [tenantId], references: [id])
 
  @@unique([tenantId, agentRole, key])
  @@map("tenant_agent_memory")
}

Standard memory keys per agent role:

Agent role	Memory keys
blog-writer	`covered_topics`, `preferred_length_words`, `approved_structures`, `rejected_patterns`
social-post-writer	`used_hashtags`, `preferred_cta_style`, `avg_approved_length`
strategy-writer	`previous_goals`, `channel_history`, `approved_pillars`
context-file-writer	`revision_count`, `last_approved_sections`

2. Memory extraction after each approved run

When content is approved (status changes to client_approved or active), run a memory extraction job:


// packages/agents/src/lib/memory-extractor.ts
 
export async function extractAndStoreMemory(
  tenantId: string,
  agentRole: string,
  approvedOutput: string,
  existingMemories: TenantAgentMemory[]
): Promise<void> {
  const extractionPrompt = buildExtractionPrompt(agentRole, approvedOutput, existingMemories);
 
  // Fast haiku call
  const extracted = await claudeHaiku(extractionPrompt);
  const updates = parseMemoryUpdates(extracted);
 
  for (const [key, value] of Object.entries(updates)) {
    await db.tenantAgentMemory.upsert({
      where: { tenantId_agentRole_key: { tenantId, agentRole, key } },
      update: { value, updatedAt: new Date() },
      create: { tenantId, agentRole, key, value },
    });
  }
}

Extraction prompt example for blog-writer:


This blog post was just approved by the client. Extract memory updates in JSON format:

{
  "covered_topics": ["append new topic title here"],
  "preferred_length_words": <word count>,
  "approved_structures": ["append observed heading structure"]
}

APPROVED BLOG POST:
{approvedOutput}

EXISTING MEMORY (do not duplicate):
{existingMemories}

3. Memory injection before generation

Before running the agent, load relevant memories and inject them into the prompt:


// In blog-writer.worker.ts, before building the main prompt
 
const memories = await db.tenantAgentMemory.findMany({
  where: { tenantId, agentRole: "blog-writer" },
});
 
const memorySection = buildMemorySection(memories);
// Produces something like:
// ## ACCUMULATED KNOWLEDGE FOR THIS CLIENT
// - Topics already covered: [list of 10 titles]
// - Preferred article length: ~1,400 words
// - Approved content structures: [intro → 3 H2 sections → CTA]
// - Patterns to avoid: listicles without examples, generic intros

4. Reflection synthesis (periodic, not per-run)

Inspired directly by the Generative Agents reflection mechanism: periodically synthesise raw memories into higher-level insights. Run this as a scheduled job (e.g., after every 5 approved posts):


Given these memory entries for this tenant's blog content:
{memories}

What are the 3 most important patterns we should always apply for this tenant?
What are the 3 most common mistakes to avoid?
Return as JSON: { alwaysDo: string[], neverDo: string[] }

Store the synthesis result as a special synthesis key in TenantAgentMemory. Inject it at the top of future prompts as the highest-priority context.

5. Memory confidence decay

If a rejected run contradicts an existing memory, reduce its confidence score. If confidence drops below 0.3, soft-delete the memory:


// When a run is rejected
if (wakeReason === "rejection" && rejectionFeedback) {
  const contradictedMemories = await findContradictedMemories(tenantId, agentRole, rejectionFeedback);
  for (const memory of contradictedMemories) {
    await db.tenantAgentMemory.update({
      where: { id: memory.id },
      data: { confidence: memory.confidence * 0.6 },
    });
  }
}

Files to Change

packages/db/prisma/schema.prisma — add TenantAgentMemory model
New file: packages/agents/src/lib/memory-extractor.ts
New file: packages/agents/src/lib/memory-injector.ts
packages/agents/src/workers/blog-writer.worker.ts — inject memory before generation; extract after approval
packages/agents/src/workers/social-post-writer.worker.ts — same pattern
packages/api/src/routers/tenant/ — webhook/handler for client_approved status change to trigger extraction
New scheduled job in apps/servers/scheduler — periodic reflection synthesis

Gap 1: Learning from feedback history (episodic memory is the structured version of episode retrieval)
Gap 2: RAG recency + importance scoring (memory supplements RAG for run-specific knowledge)