Gap 8: No Context Window Management Strategy
Problem
Agent prompts are assembled by concatenating sections without measuring total size. There is no enforcement of a token budget, no pruning strategy when context grows too large, and no graceful degradation when the limit is approached.
Lilian Weng’s survey lists finite context length as the primary hard limitation of LLM agents: “The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses.”
What gets concatenated today (blog-writer example)
| Section | Typical size |
|---|---|
| System prompt | ~800 tokens |
| Client context file | 1,000–4,000 tokens |
| RAG results (topK chunks) | 500–3,000 tokens |
| Internal pages list | 200–2,000 tokens |
| Skills CLAUDE.md | ~300 tokens |
| Content brief | 200–600 tokens |
| Reviewer feedback (if revision) | 100–500 tokens |
| Total | 3,100–11,200 tokens |
Claude Sonnet 4.6 has a 200k token context, so this rarely hits the hard limit today — but:
- Larger prompts cost more and are slower
- Research shows LLMs have an “attention trough” — content in the middle of a very long prompt is less attended to than content at the start and end
- As features grow (more internal pages, richer client context, longer revision histories), this will become a real problem
- There is currently no visibility into which sections are consuming the most tokens
What to Build
1. Token budget system
Define per-section budgets that sum to a safe total:
// packages/agents/src/lib/context-budget.ts
export interface ContextBudget {
systemPrompt: number;
clientContext: number;
ragResults: number;
internalPages: number;
skills: number;
contentBrief: number;
reviewerFeedback: number;
buffer: number; // reserved for model response
}
export const DEFAULT_BUDGET: ContextBudget = {
systemPrompt: 1_000,
clientContext: 3_000,
ragResults: 2_000,
internalPages: 1_500,
skills: 400,
contentBrief: 800,
reviewerFeedback: 600,
buffer: 4_000, // for the response
};
export const TOTAL_BUDGET = Object.values(DEFAULT_BUDGET).reduce((a, b) => a + b, 0);
// = 13,300 tokens — well within 200k but gives discipline2. Token counter utility
// packages/agents/src/lib/token-counter.ts
import { encodingForModel } from "js-tiktoken";
const enc = encodingForModel("cl100k_base"); // Anthropic uses same tokenizer base
export function countTokens(text: string): number {
return enc.encode(text).length;
}
export function truncateToTokens(text: string, maxTokens: number): string {
const tokens = enc.encode(text);
if (tokens.length <= maxTokens) return text;
return new TextDecoder().decode(enc.decode(tokens.slice(0, maxTokens))) + "\n[truncated]";
}3. Budget-aware prompt builder
Replace ad hoc string concatenation with a budget-aware builder:
// packages/agents/src/lib/prompt-builder.ts
export class PromptBuilder {
private sections: { name: string; content: string; budget: number; priority: number }[] = [];
add(name: string, content: string, budget: number, priority: number) {
const tokens = countTokens(content);
const truncated = tokens > budget ? truncateToTokens(content, budget) : content;
this.sections.push({ name, content: truncated, budget, priority });
return this;
}
build(): { prompt: string; tokenCounts: Record<string, number>; totalTokens: number } {
// Sort by priority (lower = more important, kept when budget is tight)
const sorted = [...this.sections].sort((a, b) => a.priority - b.priority);
const tokenCounts: Record<string, number> = {};
let total = 0;
const parts: string[] = [];
for (const section of sorted) {
const tokens = countTokens(section.content);
tokenCounts[section.name] = tokens;
total += tokens;
parts.push(section.content);
}
return { prompt: parts.join("\n\n"), tokenCounts, totalTokens: total };
}
}Usage in blog-writer:
const { prompt, tokenCounts, totalTokens } = new PromptBuilder()
.add("system", systemPrompt, DEFAULT_BUDGET.systemPrompt, 1)
.add("client", clientContext, DEFAULT_BUDGET.clientContext, 2)
.add("brief", contentBrief, DEFAULT_BUDGET.contentBrief, 3)
.add("rag", ragResults, DEFAULT_BUDGET.ragResults, 4)
.add("pages", internalPages, DEFAULT_BUDGET.internalPages, 5)
.add("feedback", reviewerFeedback, DEFAULT_BUDGET.reviewerFeedback, 6)
.add("skills", skillsDoc, DEFAULT_BUDGET.skills, 7)
.build();
await db.agentRun.update({
where: { id: agentRunId },
data: { contextTokenBreakdown: tokenCounts, totalContextTokens: totalTokens }
});4. Log context token breakdown in AgentRun
model AgentRun {
// existing fields
contextTokenBreakdown Json? // { system: 800, client: 2400, rag: 1800, ... }
totalContextTokens Int?
contextTruncated Boolean @default(false)
}5. Surface token usage in manage portal
On the agent run detail page, show a stacked bar: which sections consumed how many tokens. This gives admins visibility into where to optimise first.
Files to Change
- New file:
packages/agents/src/lib/context-budget.ts - New file:
packages/agents/src/lib/token-counter.ts - New file:
packages/agents/src/lib/prompt-builder.ts packages/agents/src/workers/blog-writer.worker.ts— replace string concat with PromptBuilderpackages/agents/src/workers/setup.worker.ts— samepackages/agents/src/workers/strategy.worker.ts— samepackages/db/prisma/schema.prisma— add token fields toAgentRun
Related
- Gap 2: RAG recency + importance scoring (better ranking reduces tokens needed from RAG)
- Gap 13: Cost circuit breaker (token count informs cost estimation before sending)