Gap 12: No Tool Usage Analytics
Problem
AgentRun.toolsUsed is an integer counter. There is no record of which tools were called, with what arguments, what they returned, or whether the returned content was actually used in the final output.
From the Toolformer and TALM papers, meaningful improvement in tool use requires knowing:
- Which tools are called and how often
- Whether tool calls succeed (vs. returning empty results or errors)
- Whether tool results are cited in the output — the key signal for whether the tool is actually useful
Without this data, it is impossible to improve agent prompts, prune unused skills, or identify when a tool is consistently returning low-value results.
Concrete examples of what’s invisible today
- The blog-writer for a tenant with thin RAG content calls
search_knowledge.js8 times and gets near-empty results every time, but still completes. The admin has no visibility into this. search_knowledge.jsis called with a very broad query (“marketing content”) and returns 3 loosely relevant chunks. The final blog post doesn’t reference any of them. The tool was wasted.- A mapped Skill row (e.g., a competitor analysis script) is never actually called by any agent, but it is injected into every blog-writer run, consuming prompt tokens.
What to Build
1. AgentToolCall model
Replace the integer counter with a structured per-call log:
model AgentToolCall {
id String @id @default(cuid())
agentRunId String
toolName String // "search_knowledge", "competitor_analysis", etc.
args Json // arguments passed to the tool
result Json? // tool return value (truncated to 500 chars if large)
success Boolean
errorMsg String?
durationMs Int?
cited Boolean @default(false) // did the final output reference this result?
createdAt DateTime @default(now())
agentRun AgentRun @relation(fields: [agentRunId], references: [id])
@@map("agent_tool_call")
}2. Parse tool events from transcript
The AgentRun.transcript already stores tool events. Parse them post-run to populate AgentToolCall:
// packages/agents/src/lib/tool-call-extractor.ts
export function extractToolCalls(
transcript: TranscriptEvent[],
finalOutput: string
): AgentToolCallCreate[] {
const calls: AgentToolCallCreate[] = [];
for (let i = 0; i < transcript.length; i++) {
const event = transcript[i];
if (event.type !== "tool_use") continue;
const resultEvent = transcript[i + 1]; // tool_result follows tool_use
const result = resultEvent?.type === "tool_result" ? resultEvent.content : null;
// Check if the result text appears in the final output (citation detection)
const resultText = typeof result === "string" ? result : JSON.stringify(result);
const cited = resultText.length > 20 &&
finalOutput.toLowerCase().includes(resultText.slice(0, 60).toLowerCase());
calls.push({
toolName: event.metadata?.toolName ?? "unknown",
args: parseArgs(event.metadata?.args),
result: truncate(resultText, 500),
success: !event.metadata?.error,
errorMsg: event.metadata?.error ?? null,
durationMs: event.metadata?.durationMs ?? null,
cited,
});
}
return calls;
}3. Citation detection (improved)
Simple substring matching is a rough proxy for citation. A more reliable approach uses semantic similarity:
// After generating tool calls, run a fast citation check
async function detectCitations(
toolResults: { id: string; text: string }[],
finalOutput: string
): Promise<Map<string, boolean>> {
if (toolResults.length === 0) return new Map();
const prompt = `
For each tool result below, answer YES or NO:
Is the substance of this result reflected in the final output?
FINAL OUTPUT:
${finalOutput.slice(0, 2000)}
TOOL RESULTS:
${toolResults.map((r, i) => `[${i}] ${r.text.slice(0, 200)}`).join("\n")}
Return JSON: { "0": true/false, "1": true/false, ... }
`;
const result = await claudeHaiku(prompt);
// parse and return Map<toolResultId, cited>
}4. Skill effectiveness dashboard
In the manage portal’s agent detail page (/agents/:agentId), add a Skills tab showing:
| Skill | Times Called | Success Rate | Citation Rate | Avg Duration |
|---|---|---|---|---|
| search_knowledge | 1,240 | 94% | 61% | 420ms |
| competitor_analysis | 87 | 89% | 44% | 1,200ms |
| seo_checker | 203 | 71% | 38% | 890ms |
Skills with citation rate < 30% are candidates for prompt improvement or removal.
5. Automatic skill pruning suggestion
If a mapped Skill has a citation rate below 20% over 50+ calls, surface a warning in the admin UI:
⚠ Skill "seo_checker" has a 17% citation rate over 142 runs for blog-writer.
Consider reviewing whether this skill is providing useful results, or update the
agent's system prompt to better direct usage of this tool.Files to Change
packages/db/prisma/schema.prisma— addAgentToolCallmodel- New file:
packages/agents/src/lib/tool-call-extractor.ts packages/agents/src/workers/blog-writer.worker.ts— call extractor post-run, save to DBpackages/agents/src/workers/setup.worker.ts— sameapps/api/src/routers/admin/agents.ts— new endpointGET /admin/v1/agents/:id/tool-calls- Manage portal agent detail page — add Skills analytics tab
Related
- Gap 3: Hallucination detection (repeated tool calls with identical args is detectable from this data)
- Gap 8: Context window management (skills with low citation rates should be deprioritised in prompt budget)