Activity Planner: Slow Runtime (~8 min) — Root Cause Investigation

Status: Open
Severity: Medium-High — job completes but takes ~8 min; can exceed 16 min with internal retry, causing BullMQ lock expiry and duplicate execution
File: packages/agents/src/workers/activity.worker.ts
Observed: 2026-05-06 00:20 → 00:28 (7 min 41 sec Claude execution)

Timeline from Logs


00:20:23  Activity planner job started
00:20:23  Activity planner prompt built — invoking Claude    (instant — DB queries fast)
00:28:04  Claude execution finished                          (7 min 41 sec later)
00:28:04  JSON parsed successfully
00:28:04  requireActivityApproval=true — period set to dm_review
00:28:04  Activity planner job completed

100% of the runtime is Claude execution. All DB queries, JSON parsing, and writes are instant.

Root Cause Detail (from code investigation)

Finding 1: CLAUDE.md in skills dir MANDATES RAG searches (biggest hidden cause)

createSkillsDir() writes a CLAUDE.md into the skills directory that says:


## IMPORTANT: Call this tool before writing any content

Before generating your output, you MUST call search_knowledge to retrieve relevant context
from the tenant's indexed documents. This is mandatory — do not skip this step.

Step 1 — Search brand docs first
Step 2 — Search for the specific topic
Step 3 — Check published content to avoid duplication

The activity planner passes skillsDir (line 427, 448 in activity.worker.ts) but has no allowedTools: [] restriction. So Claude reads CLAUDE.md, obeys the “MUST call” instruction, and executes 3+ search_knowledge Bash tool calls before generating any JSON.

Each tool call = one turn = one full API round-trip to Anthropic. That is 3 extra turns before a single output token is generated.

The deliverable planner never passes skillsDir at all, so Claude never sees this file. The activity planner is the only pure-JSON planner that passes skillsDir. This CLAUDE.md was written for content-generating agents (blog writer, social post writer) — not for planning agents that do structured JSON generation.

Finding 2: Missing `allowedTools: []` and `maxTurnsPerRun`

The execute config (lines 441-453):


config: {
  cwd,
  model:                      "claude-sonnet-4-6",
  dangerouslySkipPermissions: true,
  timeoutSec:                 900,
  // NO allowedTools
  // NO maxTurnsPerRun
}

Without allowedTools: [], Claude Code CLI runs with full tool access. Claude can and does use Bash to run search_knowledge.js. Without maxTurnsPerRun, there is no cap on the number of turns — tool calls + continuation turns can inflate to 5-8 turns total.

Finding 3: Internal retry loop can double the runtime

Lines 431-508: The worker has an internal retry loop (MAX_INTERNAL_RETRIES = 2). If JSON parsing fails on attempt 1 (which happens when tool call output gets interleaved with Claude’s JSON response), the entire execute() is called again — a second full invocation.

The parse failures are caused by the tool-use pollution in Finding 1. Fix tool use, fix the retries. But if the retry does fire, runtime becomes 16+ min, which exceeds the 16 min lockDuration and causes BullMQ lock expiry + duplicate execution.

Finding 4: Full client context injected verbatim (30-50K chars)

contextText at line 399 is the entire ClientContext.content blob — same issue as the deliverable planner. The structured inputs (goalsTable, templatesTable) are already compact because they are built directly from DB fields. The only fat in the prompt is the context file.

The activity planner needs from the context: business overview and target audience (for inputHints content direction). It does NOT need brand voice details, competitor analysis, key page breakdowns, or technical website details.

Finding 5: `lockDuration` too tight and gets worse with retry

lockDuration: 960_000 = 16 min (only 1 min buffer over 900s timeout). With tool-use inflation + internal retry, actual runtime easily exceeds 16 min, causing BullMQ lock expiry and duplicate execution. The deliverable planner uses a 3 min buffer (1,080,000).

Comparison: Activity vs Deliverable Planner Config

Setting	Deliverable Planner	Activity Planner
`allowedTools`	`[]` ✅	missing ❌
`maxTurnsPerRun`	3	missing ❌
`skillsDir`	not passed ✅	passed — CLAUDE.md mandates RAG ❌❌
`lockDuration`	1,080,000 (3 min buffer)	960,000 (1 min buffer) ❌
Internal retry	no	yes — up to 2× invocations ❌
Estimated turn count	2-3	5-8 (3 RAG + 2-3 output)
Observed runtime	~7.5 min	~8 min (can be 16+ with retry)

Proposed Fixes

Fix 1: Add `allowedTools: []`, remove `skillsDir`, add `maxTurnsPerRun: 2` (immediate, biggest gain)


const execute = await getClaudeAdapter();   // remove skillsDir — not needed
 
const result = await execute({
  config: {
    cwd,
    model:                      "claude-sonnet-4-6",
    dangerouslySkipPermissions: true,
    allowedTools:               [],   // pure JSON generation — no tools needed
    maxTurnsPerRun:             2,    // 1 turn for output + 1 continuation buffer
    timeoutSec:                 600,  // 10 min (generous; tools can no longer inflate this)
  },
  prompt: activePrompt,
  // skillsDir removed
  agentId:    "activity-planner",
  ...
});

Removing skillsDir means Claude never sees the mandatory RAG CLAUDE.md. Adding allowedTools: [] as a belt-and-suspenders guard. Expected impact: removes 3+ tool-call turns, saves 4-6 min. With no tool use the first-attempt JSON parse will succeed, making the internal retry loop a true last-resort rather than a regular occurrence.

Fix 2: Increase `lockDuration` buffer (safety fix, zero risk)


lockDuration: 1_080_000,  // 18 min (15 min timeout + 3 min buffer, matches deliverable planner)

Fix 3: Extract planning-relevant sections from client context (shared with deliverable planner)

Same approach as the deliverable planner Option 1 — parse the context markdown by # headings at prompt-build time and include only: overview, goals, audience, channels.

Estimated impact: reduces context from ~30-50K chars to ~4-6K chars, saves 1-2 min per turn.

Fix 4: Switch model to claude-haiku-4-5 (optional, after Fix 1 validated)

Pure structured JSON from known templates → activity pipeline. Haiku performs equivalently at ~5x the speed.


model:      "claude-haiku-4-5",
timeoutSec: 300,
lockDuration: 480_000,

Recommended Fix Order

Fix 1 (allowedTools + remove skillsDir + maxTurnsPerRun) — zero quality risk, expected 4-6 min gain, also eliminates the retry-loop double-invocation risk.
Fix 2 (lockDuration buffer) — safety fix, zero risk.
Fix 3 (context section extraction) — share implementation with deliverable planner.
Fix 4 (Haiku) — after validating Fix 1+3 on a live run.