Gap 12: No Tool Usage Analytics

Problem

AgentRun.toolsUsed is an integer counter. There is no record of which tools were called, with what arguments, what they returned, or whether the returned content was actually used in the final output.

From the Toolformer and TALM papers, meaningful improvement in tool use requires knowing:

Which tools are called and how often
Whether tool calls succeed (vs. returning empty results or errors)
Whether tool results are cited in the output — the key signal for whether the tool is actually useful

Without this data, it is impossible to improve agent prompts, prune unused skills, or identify when a tool is consistently returning low-value results.

Concrete examples of what’s invisible today

The blog-writer for a tenant with thin RAG content calls search_knowledge.js 8 times and gets near-empty results every time, but still completes. The admin has no visibility into this.
search_knowledge.js is called with a very broad query (“marketing content”) and returns 3 loosely relevant chunks. The final blog post doesn’t reference any of them. The tool was wasted.
A mapped Skill row (e.g., a competitor analysis script) is never actually called by any agent, but it is injected into every blog-writer run, consuming prompt tokens.

What to Build

1. AgentToolCall model

Replace the integer counter with a structured per-call log:


model AgentToolCall {
  id          String   @id @default(cuid())
  agentRunId  String
  toolName    String   // "search_knowledge", "competitor_analysis", etc.
  args        Json     // arguments passed to the tool
  result      Json?    // tool return value (truncated to 500 chars if large)
  success     Boolean
  errorMsg    String?
  durationMs  Int?
  cited       Boolean  @default(false)  // did the final output reference this result?
  createdAt   DateTime @default(now())
 
  agentRun    AgentRun @relation(fields: [agentRunId], references: [id])
  @@map("agent_tool_call")
}

2. Parse tool events from transcript

The AgentRun.transcript already stores tool events. Parse them post-run to populate AgentToolCall:


// packages/agents/src/lib/tool-call-extractor.ts
 
export function extractToolCalls(
  transcript: TranscriptEvent[],
  finalOutput: string
): AgentToolCallCreate[] {
  const calls: AgentToolCallCreate[] = [];
 
  for (let i = 0; i < transcript.length; i++) {
    const event = transcript[i];
    if (event.type !== "tool_use") continue;
 
    const resultEvent = transcript[i + 1]; // tool_result follows tool_use
    const result = resultEvent?.type === "tool_result" ? resultEvent.content : null;
 
    // Check if the result text appears in the final output (citation detection)
    const resultText = typeof result === "string" ? result : JSON.stringify(result);
    const cited = resultText.length > 20 &&
      finalOutput.toLowerCase().includes(resultText.slice(0, 60).toLowerCase());
 
    calls.push({
      toolName: event.metadata?.toolName ?? "unknown",
      args: parseArgs(event.metadata?.args),
      result: truncate(resultText, 500),
      success: !event.metadata?.error,
      errorMsg: event.metadata?.error ?? null,
      durationMs: event.metadata?.durationMs ?? null,
      cited,
    });
  }
 
  return calls;
}

3. Citation detection (improved)

Simple substring matching is a rough proxy for citation. A more reliable approach uses semantic similarity:


// After generating tool calls, run a fast citation check
async function detectCitations(
  toolResults: { id: string; text: string }[],
  finalOutput: string
): Promise<Map<string, boolean>> {
  if (toolResults.length === 0) return new Map();
 
  const prompt = `
For each tool result below, answer YES or NO: 
Is the substance of this result reflected in the final output?
 
FINAL OUTPUT:
${finalOutput.slice(0, 2000)}
 
TOOL RESULTS:
${toolResults.map((r, i) => `[${i}] ${r.text.slice(0, 200)}`).join("\n")}
 
Return JSON: { "0": true/false, "1": true/false, ... }
`;
 
  const result = await claudeHaiku(prompt);
  // parse and return Map<toolResultId, cited>
}

4. Skill effectiveness dashboard

In the manage portal’s agent detail page (/agents/:agentId), add a Skills tab showing:

Skill	Times Called	Success Rate	Citation Rate	Avg Duration
search_knowledge	1,240	94%	61%	420ms
competitor_analysis	87	89%	44%	1,200ms
seo_checker	203	71%	38%	890ms

Skills with citation rate < 30% are candidates for prompt improvement or removal.

5. Automatic skill pruning suggestion

If a mapped Skill has a citation rate below 20% over 50+ calls, surface a warning in the admin UI:


⚠ Skill "seo_checker" has a 17% citation rate over 142 runs for blog-writer.
Consider reviewing whether this skill is providing useful results, or update the 
agent's system prompt to better direct usage of this tool.

Files to Change

packages/db/prisma/schema.prisma — add AgentToolCall model
New file: packages/agents/src/lib/tool-call-extractor.ts
packages/agents/src/workers/blog-writer.worker.ts — call extractor post-run, save to DB
packages/agents/src/workers/setup.worker.ts — same
apps/api/src/routers/admin/agents.ts — new endpoint GET /admin/v1/agents/:id/tool-calls
Manage portal agent detail page — add Skills analytics tab

Gap 3: Hallucination detection (repeated tool calls with identical args is detectable from this data)
Gap 8: Context window management (skills with low citation rates should be deprioritised in prompt budget)