Gap 8: No Context Window Management Strategy

Problem

Agent prompts are assembled by concatenating sections without measuring total size. There is no enforcement of a token budget, no pruning strategy when context grows too large, and no graceful degradation when the limit is approached.

Lilian Weng’s survey lists finite context length as the primary hard limitation of LLM agents: “The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses.”

What gets concatenated today (blog-writer example)

Section	Typical size
System prompt	~800 tokens
Client context file	1,000–4,000 tokens
RAG results (topK chunks)	500–3,000 tokens
Internal pages list	200–2,000 tokens
Skills CLAUDE.md	~300 tokens
Content brief	200–600 tokens
Reviewer feedback (if revision)	100–500 tokens
Total	3,100–11,200 tokens

Claude Sonnet 4.6 has a 200k token context, so this rarely hits the hard limit today — but:

Larger prompts cost more and are slower
Research shows LLMs have an “attention trough” — content in the middle of a very long prompt is less attended to than content at the start and end
As features grow (more internal pages, richer client context, longer revision histories), this will become a real problem
There is currently no visibility into which sections are consuming the most tokens

What to Build

1. Token budget system

Define per-section budgets that sum to a safe total:


// packages/agents/src/lib/context-budget.ts
 
export interface ContextBudget {
  systemPrompt: number;
  clientContext: number;
  ragResults: number;
  internalPages: number;
  skills: number;
  contentBrief: number;
  reviewerFeedback: number;
  buffer: number;          // reserved for model response
}
 
export const DEFAULT_BUDGET: ContextBudget = {
  systemPrompt: 1_000,
  clientContext: 3_000,
  ragResults: 2_000,
  internalPages: 1_500,
  skills: 400,
  contentBrief: 800,
  reviewerFeedback: 600,
  buffer: 4_000,           // for the response
};
 
export const TOTAL_BUDGET = Object.values(DEFAULT_BUDGET).reduce((a, b) => a + b, 0);
// = 13,300 tokens — well within 200k but gives discipline

2. Token counter utility


// packages/agents/src/lib/token-counter.ts
 
import { encodingForModel } from "js-tiktoken";
 
const enc = encodingForModel("cl100k_base"); // Anthropic uses same tokenizer base
 
export function countTokens(text: string): number {
  return enc.encode(text).length;
}
 
export function truncateToTokens(text: string, maxTokens: number): string {
  const tokens = enc.encode(text);
  if (tokens.length <= maxTokens) return text;
  return new TextDecoder().decode(enc.decode(tokens.slice(0, maxTokens))) + "\n[truncated]";
}

3. Budget-aware prompt builder

Replace ad hoc string concatenation with a budget-aware builder:


// packages/agents/src/lib/prompt-builder.ts
 
export class PromptBuilder {
  private sections: { name: string; content: string; budget: number; priority: number }[] = [];
 
  add(name: string, content: string, budget: number, priority: number) {
    const tokens = countTokens(content);
    const truncated = tokens > budget ? truncateToTokens(content, budget) : content;
    this.sections.push({ name, content: truncated, budget, priority });
    return this;
  }
 
  build(): { prompt: string; tokenCounts: Record<string, number>; totalTokens: number } {
    // Sort by priority (lower = more important, kept when budget is tight)
    const sorted = [...this.sections].sort((a, b) => a.priority - b.priority);
    const tokenCounts: Record<string, number> = {};
    let total = 0;
    const parts: string[] = [];
 
    for (const section of sorted) {
      const tokens = countTokens(section.content);
      tokenCounts[section.name] = tokens;
      total += tokens;
      parts.push(section.content);
    }
 
    return { prompt: parts.join("\n\n"), tokenCounts, totalTokens: total };
  }
}

Usage in blog-writer:


const { prompt, tokenCounts, totalTokens } = new PromptBuilder()
  .add("system",        systemPrompt,   DEFAULT_BUDGET.systemPrompt,   1)
  .add("client",        clientContext,  DEFAULT_BUDGET.clientContext,   2)
  .add("brief",         contentBrief,   DEFAULT_BUDGET.contentBrief,    3)
  .add("rag",           ragResults,     DEFAULT_BUDGET.ragResults,      4)
  .add("pages",         internalPages,  DEFAULT_BUDGET.internalPages,   5)
  .add("feedback",      reviewerFeedback, DEFAULT_BUDGET.reviewerFeedback, 6)
  .add("skills",        skillsDoc,      DEFAULT_BUDGET.skills,          7)
  .build();
 
await db.agentRun.update({
  where: { id: agentRunId },
  data: { contextTokenBreakdown: tokenCounts, totalContextTokens: totalTokens }
});

4. Log context token breakdown in AgentRun


model AgentRun {
  // existing fields
  contextTokenBreakdown  Json?    // { system: 800, client: 2400, rag: 1800, ... }
  totalContextTokens     Int?
  contextTruncated       Boolean  @default(false)
}

5. Surface token usage in manage portal

On the agent run detail page, show a stacked bar: which sections consumed how many tokens. This gives admins visibility into where to optimise first.

Files to Change

New file: packages/agents/src/lib/context-budget.ts
New file: packages/agents/src/lib/token-counter.ts
New file: packages/agents/src/lib/prompt-builder.ts
packages/agents/src/workers/blog-writer.worker.ts — replace string concat with PromptBuilder
packages/agents/src/workers/setup.worker.ts — same
packages/agents/src/workers/strategy.worker.ts — same
packages/db/prisma/schema.prisma — add token fields to AgentRun

Gap 2: RAG recency + importance scoring (better ranking reduces tokens needed from RAG)
Gap 13: Cost circuit breaker (token count informs cost estimation before sending)