Skip to Content
AgentsImprovementsGap 12: No Tool Usage Analytics

Gap 12: No Tool Usage Analytics

Problem

AgentRun.toolsUsed is an integer counter. There is no record of which tools were called, with what arguments, what they returned, or whether the returned content was actually used in the final output.

From the Toolformer and TALM papers, meaningful improvement in tool use requires knowing:

  • Which tools are called and how often
  • Whether tool calls succeed (vs. returning empty results or errors)
  • Whether tool results are cited in the output — the key signal for whether the tool is actually useful

Without this data, it is impossible to improve agent prompts, prune unused skills, or identify when a tool is consistently returning low-value results.

Concrete examples of what’s invisible today

  • The blog-writer for a tenant with thin RAG content calls search_knowledge.js 8 times and gets near-empty results every time, but still completes. The admin has no visibility into this.
  • search_knowledge.js is called with a very broad query (“marketing content”) and returns 3 loosely relevant chunks. The final blog post doesn’t reference any of them. The tool was wasted.
  • A mapped Skill row (e.g., a competitor analysis script) is never actually called by any agent, but it is injected into every blog-writer run, consuming prompt tokens.

What to Build

1. AgentToolCall model

Replace the integer counter with a structured per-call log:

model AgentToolCall { id String @id @default(cuid()) agentRunId String toolName String // "search_knowledge", "competitor_analysis", etc. args Json // arguments passed to the tool result Json? // tool return value (truncated to 500 chars if large) success Boolean errorMsg String? durationMs Int? cited Boolean @default(false) // did the final output reference this result? createdAt DateTime @default(now()) agentRun AgentRun @relation(fields: [agentRunId], references: [id]) @@map("agent_tool_call") }

2. Parse tool events from transcript

The AgentRun.transcript already stores tool events. Parse them post-run to populate AgentToolCall:

// packages/agents/src/lib/tool-call-extractor.ts export function extractToolCalls( transcript: TranscriptEvent[], finalOutput: string ): AgentToolCallCreate[] { const calls: AgentToolCallCreate[] = []; for (let i = 0; i < transcript.length; i++) { const event = transcript[i]; if (event.type !== "tool_use") continue; const resultEvent = transcript[i + 1]; // tool_result follows tool_use const result = resultEvent?.type === "tool_result" ? resultEvent.content : null; // Check if the result text appears in the final output (citation detection) const resultText = typeof result === "string" ? result : JSON.stringify(result); const cited = resultText.length > 20 && finalOutput.toLowerCase().includes(resultText.slice(0, 60).toLowerCase()); calls.push({ toolName: event.metadata?.toolName ?? "unknown", args: parseArgs(event.metadata?.args), result: truncate(resultText, 500), success: !event.metadata?.error, errorMsg: event.metadata?.error ?? null, durationMs: event.metadata?.durationMs ?? null, cited, }); } return calls; }

3. Citation detection (improved)

Simple substring matching is a rough proxy for citation. A more reliable approach uses semantic similarity:

// After generating tool calls, run a fast citation check async function detectCitations( toolResults: { id: string; text: string }[], finalOutput: string ): Promise<Map<string, boolean>> { if (toolResults.length === 0) return new Map(); const prompt = ` For each tool result below, answer YES or NO: Is the substance of this result reflected in the final output? FINAL OUTPUT: ${finalOutput.slice(0, 2000)} TOOL RESULTS: ${toolResults.map((r, i) => `[${i}] ${r.text.slice(0, 200)}`).join("\n")} Return JSON: { "0": true/false, "1": true/false, ... } `; const result = await claudeHaiku(prompt); // parse and return Map<toolResultId, cited> }

4. Skill effectiveness dashboard

In the manage portal’s agent detail page (/agents/:agentId), add a Skills tab showing:

SkillTimes CalledSuccess RateCitation RateAvg Duration
search_knowledge1,24094%61%420ms
competitor_analysis8789%44%1,200ms
seo_checker20371%38%890ms

Skills with citation rate < 30% are candidates for prompt improvement or removal.

5. Automatic skill pruning suggestion

If a mapped Skill has a citation rate below 20% over 50+ calls, surface a warning in the admin UI:

⚠ Skill "seo_checker" has a 17% citation rate over 142 runs for blog-writer. Consider reviewing whether this skill is providing useful results, or update the agent's system prompt to better direct usage of this tool.

Files to Change

  • packages/db/prisma/schema.prisma — add AgentToolCall model
  • New file: packages/agents/src/lib/tool-call-extractor.ts
  • packages/agents/src/workers/blog-writer.worker.ts — call extractor post-run, save to DB
  • packages/agents/src/workers/setup.worker.ts — same
  • apps/api/src/routers/admin/agents.ts — new endpoint GET /admin/v1/agents/:id/tool-calls
  • Manage portal agent detail page — add Skills analytics tab
  • Gap 3: Hallucination detection (repeated tool calls with identical args is detectable from this data)
  • Gap 8: Context window management (skills with low citation rates should be deprioritised in prompt budget)

© 2026 Leadmetrics — Internal use only