Skip to Content
AgentsImprovementsGap 13: No Cost Circuit Breaker

Gap 13: No Cost Circuit Breaker

Problem

AgentRun.costUsd is tracked per run but there is no alerting, rate limiting, or automatic circuit-breaking if costs spike. A runaway agent — one stuck in a tool-call loop, processing an unexpectedly large document, or retrying repeatedly — would exhaust a tenant’s credit budget or the platform’s API quota before any human notices.

The current BullMQ retry configuration (4 attempts, exponential backoff from 5s) means a failing job can run four full attempts without any cost or quality check between them.

Concrete failure scenarios

  1. Tool call loop: An agent calls search_knowledge.js in a loop (see Gap 3 — hallucination). Each loop iteration uses input tokens. By the time the 900-second timeout kills the job, it has consumed 3× the expected token count.

  2. Large document processing: A tenant uploads a 200-page PDF to their knowledge base. The context-file-writer ingests the entire thing into its prompt (no token budget — see Gap 8). One run costs $4 instead of the expected $0.40.

  3. Retry storm: A transient error causes the blog-writer to fail at the last step. BullMQ retries 4 times. Each retry rebuilds the full prompt and calls the adapter. 4 failed runs at $0.20 each = $0.80 wasted.

  4. Tenant credit exhaustion: A tenant with 100 credits remaining has 8 background jobs enqueued. All 8 start simultaneously. Credits run out mid-batch; some jobs complete, some fail, some are partially completed.

What to Build

1. Per-run cost cap in AgentConfig

Add a maximum cost threshold per agent role:

model AgentConfig { // existing fields maxCostUsdPerRun Float? // null = no limit; e.g. 2.0 for expensive agents maxTokensPerRun Int? // hard token limit at adapter level }

Default caps by role (suggested):

Agent rolemaxCostUsdPerRunmaxTokensPerRun
blog-writer$1.0050,000
strategy-writer$3.00100,000
context-file-writer$2.0080,000
social-post-writer$0.2010,000
insight workers$0.5020,000

2. Running cost estimator in adapter progress callback

The adapter already has a progress callback that fires on tool_use events and increments toolsUsed. Extend it to track running cost estimate:

// In setup.worker.ts / blog-writer.worker.ts let runningCostEstimate = 0; const MAX_COST = agentConfig.maxCostUsdPerRun ?? Infinity; const progressCallback = (event: AdapterProgressEvent) => { if (event.type === "usage") { runningCostEstimate += estimateCost( agentConfig.model, event.inputTokens ?? 0, event.outputTokens ?? 0 ); if (runningCostEstimate > MAX_COST) { // Signal the adapter to abort abortController.abort(`Cost limit exceeded: $${runningCostEstimate.toFixed(4)} > $${MAX_COST}`); } } };

When the abort fires, the job fails with a structured cost_limit_exceeded error (see Gap — structured errors).

3. Tenant-level credit pre-check before enqueue

Before adding any job to the queue, check that the tenant has sufficient credits to cover the estimated cost:

// packages/queue/src/enqueue-guard.ts export async function enqueueWithCreditCheck( queue: Queue, jobName: string, jobData: unknown, opts: { estimatedCostUsd: number; tenantId: string; priority?: number } ) { const balance = await getCreditBalance(opts.tenantId); if (balance < opts.estimatedCostUsd * 1.5) { // 50% safety margin throw new InsufficientCreditsError( `Tenant ${opts.tenantId} has $${balance} credits but job requires ~$${opts.estimatedCostUsd}` ); } return queue.add(jobName, jobData, { priority: opts.priority }); }

4. Platform-level daily cost circuit breaker

A global circuit breaker that trips if the platform’s total LLM spend exceeds a daily threshold:

// packages/agents/src/lib/cost-circuit-breaker.ts const DAILY_PLATFORM_LIMIT_USD = 500; // configurable via PlatformSetting export async function checkPlatformCircuitBreaker(): Promise<void> { const todaySpend = await db.agentRun.aggregate({ _sum: { costUsd: true }, where: { startedAt: { gte: startOfDay(new Date()) }, status: { in: ["completed", "failed"] }, }, }); const spend = todaySpend._sum.costUsd ?? 0; if (spend > DAILY_PLATFORM_LIMIT_USD) { // Trip the breaker — pause all LOW and BACKGROUND priority queues await pauseLowPriorityQueues(); await sendAdminAlert({ type: "cost_circuit_breaker_tripped", message: `Platform daily spend $${spend.toFixed(2)} exceeded limit $${DAILY_PLATFORM_LIMIT_USD}`, }); throw new CircuitBreakerError(`Platform daily cost limit reached: $${spend.toFixed(2)}`); } }

Only pause LOW/BACKGROUND queues — CRITICAL and HIGH priority jobs (user-facing, rejection re-runs) are allowed to continue.

5. Cost anomaly alerting

Send an alert when any single run costs more than 3× the rolling average for that agent role:

const avgCost = await getAvgRunCost(agentRole, 30); // 30-day rolling average if (completedRun.costUsd > avgCost * 3) { await sendAdminAlert({ type: "cost_anomaly", agentRole, runId: completedRun.id, tenantId: completedRun.tenantId, costUsd: completedRun.costUsd, avgCostUsd: avgCost, message: `Run cost $${completedRun.costUsd.toFixed(4)} is ${(completedRun.costUsd / avgCost).toFixed(1)}× the 30-day average`, }); }

6. Expose circuit breaker status in Execution Queue dashboard

The /dashboards/execution-queue page should show:

  • Today’s platform spend vs. daily limit (progress bar)
  • Which queues are currently paused (if circuit breaker tripped)
  • Per-tenant spend ranking (top 10 spenders today)
  • Anomalous runs flagged in the last 24 hours

Files to Change

  • packages/db/prisma/schema.prisma — add maxCostUsdPerRun, maxTokensPerRun to AgentConfig
  • New file: packages/agents/src/lib/cost-circuit-breaker.ts
  • New file: packages/queue/src/enqueue-guard.ts
  • packages/agents/src/workers/blog-writer.worker.ts — add running cost tracker to progress callback
  • packages/agents/src/workers/setup.worker.ts — same
  • apps/api/src/routers/admin/agents.ts — expose cost cap settings in PUT endpoint
  • apps/dashboard/src/app/(dashboard)/dashboards/execution-queue/ — circuit breaker status panel
  • Gap 7: Priority queue differentiation (circuit breaker only pauses LOW/BACKGROUND — requires priority to be set)
  • Gap 8: Context window management (token limits at prompt-build time prevent runaway costs before the run starts)
  • Gap 10: Dynamic model routing (routing to cheaper models reduces cost before circuit breaker is needed)

© 2026 Leadmetrics — Internal use only