Agent Architecture Improvements
Gaps identified by mapping the Leadmetrics agent architecture against the patterns in LLM Powered Autonomous Agents (Lilian Weng, 2023).
Each document describes the problem in terms of the actual codebase, then a concrete fix with file paths.
Index
| # | Document | Area | Priority |
|---|---|---|---|
| 01 | Learning from Feedback History | Memory / Learning | P2 |
| 02 | RAG Recency + Importance Scoring | Retrieval | P3 |
| 03 | Hallucination Detection | Reliability | P3 |
| 04 | Bridge BullMQ ↔ LangGraph | Architecture | P2 |
| 05 | Structured Output Contracts | Reliability | P1 |
| 06 | Critic Agent Quality Gate | Quality | P1 |
| 07 | Priority Queue Differentiation | Performance | P2 |
| 08 | Context Window Management | Reliability | P1 |
| 09 | Episodic Memory Per Tenant | Memory | P2 |
| 10 | Dynamic Model Routing | Cost / Quality | P3 |
| 11 | Multi-Reviewer Consensus | Quality | P4 |
| 12 | Tool Usage Analytics | Observability | P3 |
| 13 | Cost Circuit Breaker | Safety | P3 |
Priority grouping
P1 — Defensive (implement first, no architecture changes required)
These prevent bad output and silent failures in the existing pipeline.
- 05 Structured output contracts — replace regex parsing with Zod schemas + Claude tool_use extraction
- 06 Critic agent quality gate — blocking haiku pass before content reaches DM review
- 08 Context window management — token budget system, prompt builder, truncation strategy
P2 — Compound improvement (meaningful quality gains, moderate effort)
- 07 Priority queue differentiation — BullMQ priority field, rejection re-runs get CRITICAL priority
- 01 Learning from feedback history — episode retrieval layer, TenantAgentMemory model
- 09 Episodic memory per tenant — accumulate approved-run learnings, inject into future runs
- 04 Bridge BullMQ ↔ LangGraph — triggerAgentJob tool for executor agent (unblocks phase 3)
P3 — Optimisation (improve over time with data)
- 02 RAG recency + importance scoring — weighted retrieval formula
- 03 Hallucination detection — transcript analyser, repeated-tool-call detection
- 10 Dynamic model routing — task complexity classifier, haiku for simple tasks
- 12 Tool usage analytics — AgentToolCall model, citation detection, skill effectiveness dashboard
- 13 Cost circuit breaker — per-run cost cap, daily platform limit, anomaly alerting
P4 — High-stakes only (significant effort, narrow application)
- 11 Multi-reviewer consensus — devil’s advocate critic for strategy and context documents only
Cross-cutting dependencies
05 (structured output) ──→ 06 (critic gate)
──→ 11 (multi-reviewer)
07 (priority queues) ──→ 04 (BullMQ↔LangGraph bridge)
──→ 13 (circuit breaker)
08 (context budget) ──→ 13 (circuit breaker)
──→ 02 (RAG scoring)
01 (feedback history) ──→ 09 (episodic memory)
12 (tool analytics) ──→ 03 (hallucination detection)
──→ 08 (context budget — prune unused skills)