Skip to Content
AgentsImprovementsGap 8: No Context Window Management Strategy

Gap 8: No Context Window Management Strategy

Problem

Agent prompts are assembled by concatenating sections without measuring total size. There is no enforcement of a token budget, no pruning strategy when context grows too large, and no graceful degradation when the limit is approached.

Lilian Weng’s survey lists finite context length as the primary hard limitation of LLM agents: “The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses.”

What gets concatenated today (blog-writer example)

SectionTypical size
System prompt~800 tokens
Client context file1,000–4,000 tokens
RAG results (topK chunks)500–3,000 tokens
Internal pages list200–2,000 tokens
Skills CLAUDE.md~300 tokens
Content brief200–600 tokens
Reviewer feedback (if revision)100–500 tokens
Total3,100–11,200 tokens

Claude Sonnet 4.6 has a 200k token context, so this rarely hits the hard limit today — but:

  1. Larger prompts cost more and are slower
  2. Research shows LLMs have an “attention trough” — content in the middle of a very long prompt is less attended to than content at the start and end
  3. As features grow (more internal pages, richer client context, longer revision histories), this will become a real problem
  4. There is currently no visibility into which sections are consuming the most tokens

What to Build

1. Token budget system

Define per-section budgets that sum to a safe total:

// packages/agents/src/lib/context-budget.ts export interface ContextBudget { systemPrompt: number; clientContext: number; ragResults: number; internalPages: number; skills: number; contentBrief: number; reviewerFeedback: number; buffer: number; // reserved for model response } export const DEFAULT_BUDGET: ContextBudget = { systemPrompt: 1_000, clientContext: 3_000, ragResults: 2_000, internalPages: 1_500, skills: 400, contentBrief: 800, reviewerFeedback: 600, buffer: 4_000, // for the response }; export const TOTAL_BUDGET = Object.values(DEFAULT_BUDGET).reduce((a, b) => a + b, 0); // = 13,300 tokens — well within 200k but gives discipline

2. Token counter utility

// packages/agents/src/lib/token-counter.ts import { encodingForModel } from "js-tiktoken"; const enc = encodingForModel("cl100k_base"); // Anthropic uses same tokenizer base export function countTokens(text: string): number { return enc.encode(text).length; } export function truncateToTokens(text: string, maxTokens: number): string { const tokens = enc.encode(text); if (tokens.length <= maxTokens) return text; return new TextDecoder().decode(enc.decode(tokens.slice(0, maxTokens))) + "\n[truncated]"; }

3. Budget-aware prompt builder

Replace ad hoc string concatenation with a budget-aware builder:

// packages/agents/src/lib/prompt-builder.ts export class PromptBuilder { private sections: { name: string; content: string; budget: number; priority: number }[] = []; add(name: string, content: string, budget: number, priority: number) { const tokens = countTokens(content); const truncated = tokens > budget ? truncateToTokens(content, budget) : content; this.sections.push({ name, content: truncated, budget, priority }); return this; } build(): { prompt: string; tokenCounts: Record<string, number>; totalTokens: number } { // Sort by priority (lower = more important, kept when budget is tight) const sorted = [...this.sections].sort((a, b) => a.priority - b.priority); const tokenCounts: Record<string, number> = {}; let total = 0; const parts: string[] = []; for (const section of sorted) { const tokens = countTokens(section.content); tokenCounts[section.name] = tokens; total += tokens; parts.push(section.content); } return { prompt: parts.join("\n\n"), tokenCounts, totalTokens: total }; } }

Usage in blog-writer:

const { prompt, tokenCounts, totalTokens } = new PromptBuilder() .add("system", systemPrompt, DEFAULT_BUDGET.systemPrompt, 1) .add("client", clientContext, DEFAULT_BUDGET.clientContext, 2) .add("brief", contentBrief, DEFAULT_BUDGET.contentBrief, 3) .add("rag", ragResults, DEFAULT_BUDGET.ragResults, 4) .add("pages", internalPages, DEFAULT_BUDGET.internalPages, 5) .add("feedback", reviewerFeedback, DEFAULT_BUDGET.reviewerFeedback, 6) .add("skills", skillsDoc, DEFAULT_BUDGET.skills, 7) .build(); await db.agentRun.update({ where: { id: agentRunId }, data: { contextTokenBreakdown: tokenCounts, totalContextTokens: totalTokens } });

4. Log context token breakdown in AgentRun

model AgentRun { // existing fields contextTokenBreakdown Json? // { system: 800, client: 2400, rag: 1800, ... } totalContextTokens Int? contextTruncated Boolean @default(false) }

5. Surface token usage in manage portal

On the agent run detail page, show a stacked bar: which sections consumed how many tokens. This gives admins visibility into where to optimise first.

Files to Change

  • New file: packages/agents/src/lib/context-budget.ts
  • New file: packages/agents/src/lib/token-counter.ts
  • New file: packages/agents/src/lib/prompt-builder.ts
  • packages/agents/src/workers/blog-writer.worker.ts — replace string concat with PromptBuilder
  • packages/agents/src/workers/setup.worker.ts — same
  • packages/agents/src/workers/strategy.worker.ts — same
  • packages/db/prisma/schema.prisma — add token fields to AgentRun
  • Gap 2: RAG recency + importance scoring (better ranking reduces tokens needed from RAG)
  • Gap 13: Cost circuit breaker (token count informs cost estimation before sending)

© 2026 Leadmetrics — Internal use only