Skip to Content
IssuesActivity Planner: Slow Runtime (~8 min) — Root Cause Investigation

Activity Planner: Slow Runtime (~8 min) — Root Cause Investigation

Status: Open
Severity: Medium-High — job completes but takes ~8 min; can exceed 16 min with internal retry, causing BullMQ lock expiry and duplicate execution
File: packages/agents/src/workers/activity.worker.ts
Observed: 2026-05-06 00:20 → 00:28 (7 min 41 sec Claude execution)

Timeline from Logs

00:20:23 Activity planner job started 00:20:23 Activity planner prompt built — invoking Claude (instant — DB queries fast) 00:28:04 Claude execution finished (7 min 41 sec later) 00:28:04 JSON parsed successfully 00:28:04 requireActivityApproval=true — period set to dm_review 00:28:04 Activity planner job completed

100% of the runtime is Claude execution. All DB queries, JSON parsing, and writes are instant.

Root Cause Detail (from code investigation)

Finding 1: CLAUDE.md in skills dir MANDATES RAG searches (biggest hidden cause)

createSkillsDir() writes a CLAUDE.md into the skills directory that says:

## IMPORTANT: Call this tool before writing any content Before generating your output, you MUST call search_knowledge to retrieve relevant context from the tenant's indexed documents. This is mandatory — do not skip this step. Step 1 — Search brand docs first Step 2 — Search for the specific topic Step 3 — Check published content to avoid duplication

The activity planner passes skillsDir (line 427, 448 in activity.worker.ts) but has no allowedTools: [] restriction. So Claude reads CLAUDE.md, obeys the “MUST call” instruction, and executes 3+ search_knowledge Bash tool calls before generating any JSON.

Each tool call = one turn = one full API round-trip to Anthropic. That is 3 extra turns before a single output token is generated.

The deliverable planner never passes skillsDir at all, so Claude never sees this file. The activity planner is the only pure-JSON planner that passes skillsDir. This CLAUDE.md was written for content-generating agents (blog writer, social post writer) — not for planning agents that do structured JSON generation.

Finding 2: Missing allowedTools: [] and maxTurnsPerRun

The execute config (lines 441-453):

config: { cwd, model: "claude-sonnet-4-6", dangerouslySkipPermissions: true, timeoutSec: 900, // NO allowedTools // NO maxTurnsPerRun }

Without allowedTools: [], Claude Code CLI runs with full tool access. Claude can and does use Bash to run search_knowledge.js. Without maxTurnsPerRun, there is no cap on the number of turns — tool calls + continuation turns can inflate to 5-8 turns total.

Finding 3: Internal retry loop can double the runtime

Lines 431-508: The worker has an internal retry loop (MAX_INTERNAL_RETRIES = 2). If JSON parsing fails on attempt 1 (which happens when tool call output gets interleaved with Claude’s JSON response), the entire execute() is called again — a second full invocation.

The parse failures are caused by the tool-use pollution in Finding 1. Fix tool use, fix the retries. But if the retry does fire, runtime becomes 16+ min, which exceeds the 16 min lockDuration and causes BullMQ lock expiry + duplicate execution.

Finding 4: Full client context injected verbatim (30-50K chars)

contextText at line 399 is the entire ClientContext.content blob — same issue as the deliverable planner. The structured inputs (goalsTable, templatesTable) are already compact because they are built directly from DB fields. The only fat in the prompt is the context file.

The activity planner needs from the context: business overview and target audience (for inputHints content direction). It does NOT need brand voice details, competitor analysis, key page breakdowns, or technical website details.

Finding 5: lockDuration too tight and gets worse with retry

lockDuration: 960_000 = 16 min (only 1 min buffer over 900s timeout). With tool-use inflation + internal retry, actual runtime easily exceeds 16 min, causing BullMQ lock expiry and duplicate execution. The deliverable planner uses a 3 min buffer (1,080,000).

Comparison: Activity vs Deliverable Planner Config

SettingDeliverable PlannerActivity Planner
allowedTools[]missing
maxTurnsPerRun3missing
skillsDirnot passed ✅passed — CLAUDE.md mandates RAG ❌❌
lockDuration1,080,000 (3 min buffer)960,000 (1 min buffer)
Internal retrynoyes — up to 2× invocations
Estimated turn count2-35-8 (3 RAG + 2-3 output)
Observed runtime~7.5 min~8 min (can be 16+ with retry)

Proposed Fixes

Fix 1: Add allowedTools: [], remove skillsDir, add maxTurnsPerRun: 2 (immediate, biggest gain)

const execute = await getClaudeAdapter(); // remove skillsDir — not needed const result = await execute({ config: { cwd, model: "claude-sonnet-4-6", dangerouslySkipPermissions: true, allowedTools: [], // pure JSON generation — no tools needed maxTurnsPerRun: 2, // 1 turn for output + 1 continuation buffer timeoutSec: 600, // 10 min (generous; tools can no longer inflate this) }, prompt: activePrompt, // skillsDir removed agentId: "activity-planner", ... });

Removing skillsDir means Claude never sees the mandatory RAG CLAUDE.md. Adding allowedTools: [] as a belt-and-suspenders guard. Expected impact: removes 3+ tool-call turns, saves 4-6 min. With no tool use the first-attempt JSON parse will succeed, making the internal retry loop a true last-resort rather than a regular occurrence.

Fix 2: Increase lockDuration buffer (safety fix, zero risk)

lockDuration: 1_080_000, // 18 min (15 min timeout + 3 min buffer, matches deliverable planner)

Fix 3: Extract planning-relevant sections from client context (shared with deliverable planner)

Same approach as the deliverable planner Option 1 — parse the context markdown by # headings at prompt-build time and include only: overview, goals, audience, channels.

Estimated impact: reduces context from ~30-50K chars to ~4-6K chars, saves 1-2 min per turn.

Fix 4: Switch model to claude-haiku-4-5 (optional, after Fix 1 validated)

Pure structured JSON from known templates → activity pipeline. Haiku performs equivalently at ~5x the speed.

model: "claude-haiku-4-5", timeoutSec: 300, lockDuration: 480_000,
  1. Fix 1 (allowedTools + remove skillsDir + maxTurnsPerRun) — zero quality risk, expected 4-6 min gain, also eliminates the retry-loop double-invocation risk.
  2. Fix 2 (lockDuration buffer) — safety fix, zero risk.
  3. Fix 3 (context section extraction) — share implementation with deliverable planner.
  4. Fix 4 (Haiku) — after validating Fix 1+3 on a live run.

© 2026 Leadmetrics — Internal use only