Agent Architecture Improvements

Gaps identified by mapping the Leadmetrics agent architecture against the patterns in LLM Powered Autonomous Agents (Lilian Weng, 2023).

Each document describes the problem in terms of the actual codebase, then a concrete fix with file paths.

Index

#	Document	Area	Priority
01	Learning from Feedback History	Memory / Learning	P2
02	RAG Recency + Importance Scoring	Retrieval	P3
03	Hallucination Detection	Reliability	P3
04	Bridge BullMQ ↔ LangGraph	Architecture	P2
05	Structured Output Contracts	Reliability	P1
06	Critic Agent Quality Gate	Quality	P1
07	Priority Queue Differentiation	Performance	P2
08	Context Window Management	Reliability	P1
09	Episodic Memory Per Tenant	Memory	P2
10	Dynamic Model Routing	Cost / Quality	P3
11	Multi-Reviewer Consensus	Quality	P4
12	Tool Usage Analytics	Observability	P3
13	Cost Circuit Breaker	Safety	P3

Priority grouping

P1 — Defensive (implement first, no architecture changes required)

These prevent bad output and silent failures in the existing pipeline.

05 Structured output contracts — replace regex parsing with Zod schemas + Claude tool_use extraction
06 Critic agent quality gate — blocking haiku pass before content reaches DM review
08 Context window management — token budget system, prompt builder, truncation strategy

P2 — Compound improvement (meaningful quality gains, moderate effort)

07 Priority queue differentiation — BullMQ priority field, rejection re-runs get CRITICAL priority
01 Learning from feedback history — episode retrieval layer, TenantAgentMemory model
09 Episodic memory per tenant — accumulate approved-run learnings, inject into future runs
04 Bridge BullMQ ↔ LangGraph — triggerAgentJob tool for executor agent (unblocks phase 3)

P3 — Optimisation (improve over time with data)

02 RAG recency + importance scoring — weighted retrieval formula
03 Hallucination detection — transcript analyser, repeated-tool-call detection
10 Dynamic model routing — task complexity classifier, haiku for simple tasks
12 Tool usage analytics — AgentToolCall model, citation detection, skill effectiveness dashboard
13 Cost circuit breaker — per-run cost cap, daily platform limit, anomaly alerting

P4 — High-stakes only (significant effort, narrow application)

11 Multi-reviewer consensus — devil’s advocate critic for strategy and context documents only

Cross-cutting dependencies


05 (structured output) ──→ 06 (critic gate)
                      ──→ 11 (multi-reviewer)

07 (priority queues)  ──→ 04 (BullMQ↔LangGraph bridge)
                      ──→ 13 (circuit breaker)

08 (context budget)   ──→ 13 (circuit breaker)
                      ──→ 02 (RAG scoring)

01 (feedback history) ──→ 09 (episodic memory)

12 (tool analytics)   ──→ 03 (hallucination detection)
                      ──→ 08 (context budget — prune unused skills)