Client Researcher

[Live] · agent__client-researcher · Claude Sonnet 4.6

Researches the client’s website and produces structured Markdown research notes that seed the Context File Writer and the entire downstream agent pipeline.

Overview


Function	Fetch up to 5 pages from the client’s website and write structured research notes
Type	Setup (part of the client context pipeline)
Model	Claude Sonnet 4.6
Queue	`agent__client-researcher`
Concurrency	3
Timeout	15 min (`timeoutSec: 900`)
Max turns	20 (hard cap to prevent runaway page fetching)
Est. cost / task	~$0.12–0.15
Plan	Free+ (all tenants)

Triggers

Trigger type	When	Who initiates
Auto (setup chain)	On tenant onboarding completion — first agent in the setup chain	Platform (triggered by `completeOnboarding()`)
Human on-demand	”Refresh Context” revision flow in Dashboard → Client Context	Tenant admin

Input


interface SetupJobData {
  tenantId: string;
  tenantName: string;
  country: string;
  plan: string;
  revisionNotes?: string;   // present on revision runs; focuses the research
  wakeReason: WakeReason;
  enableWebCrawl: boolean;
  ragContext?: string;      // pre-fetched website_content RAG results (server-side pre-fetch)
}

The worker does a server-side RAG pre-fetch against the website_content dataset before spawning the agent, and injects results into the prompt via ragContext. This means the agent gets any already-crawled website content without needing to call search_knowledge.js itself.

Output

The agent writes structured Markdown (not JSON). Output is passed as clientResearchOutput to the Competitor Researcher and eventually the Context File Writer.

Structure: clearly labelled sections covering products/services, target audience, brand voice, content presence, geographic focus, social profiles, and any technical notes.

How It Works

Receive job — BullMQ dequeues the job; worker builds the prompt using buildClientResearcherPrompt() (exported from setup.worker.ts) with tenantName, country, plan, optional ragContext, and optional revisionNotes.
Skills dir created — createSkillsDir() writes search_knowledge.js + CLAUDE.md + any DB-mapped skill files into a temp directory passed to the Claude Code CLI via --add-dir.
Agentic loop — Claude Code CLI runs with --dangerously-skip-permissions and a cap of 20 turns. The agent:
- Fetches at most 5 pages: homepage, services/products, about, contact, and optionally the blog index
- Does not follow links to individual blog posts or case studies
- Does not run web searches for news or press coverage — only uses what is visible on the fetched pages
- Notes any awards, press mentions, or milestones visible in the fetched content
Chain forward — On success, the worker loads any existing competitors from the Competitor table and enqueues a competitor-researcher job (or jumps to context-file-writer if enableCompetitorResearch = false).
DB log — A ClientContextLog record is written with action: "research_completed".

System Prompt (live in `AgentConfig` table)


You are the Client Researcher agent for Leadmetrics, a digital marketing agency AI platform.
Your task is to research a client's business and produce detailed research notes that will be used to generate their marketing context file.

RESEARCH REQUIREMENTS:
1. Fetch at most 5 pages from the client's website: prioritise homepage, services/products, about, and contact. If a blog index is present, fetch it too — but do not follow individual blog post links.
2. Identify and document:
   - Core products or services offered
   - Unique selling propositions (USPs) and differentiators
   - Target audience signals (language used, testimonials, case studies)
   - Geographic focus (local, national, global)
   - Brand tone and voice
   - Current content types and publishing frequency
   - Any pricing tiers or packages visible
   - Social media presence (platforms linked from site)
   - Technical indicators (e-commerce, booking system, lead gen forms)
   - Any awards, press mentions, or notable milestones visible on the pages you fetch
3. Identify the primary industry vertical and sub-niche.
4. Note any obvious content gaps or marketing weaknesses observed on the site.

DO NOT:
- Fetch more than 5 pages total
- Follow links to individual blog posts or case studies
- Run web searches for news or press coverage — only use what is visible on the website

OUTPUT FORMAT:
Write structured Markdown with clearly labelled sections. Be factual — only include information you can verify from the website. Do not speculate or invent details.

Output ONLY the research notes. No preamble, no explanation.

The system prompt is stored in AgentConfig.systemPrompt (role client-researcher) and can be updated via Manage → Agents without a redeploy. The seed in packages/db/src/seed.ts mirrors this value.

Skills Injected

Skill file	Purpose
`search_knowledge.js`	Query the tenant RAG knowledge base (all datasets)
`CLAUDE.md`	Instructions for when/how to call `search_knowledge`

Tools Used (actual)

Tool	Calls per run	Purpose
`WebFetch`	max 5	Fetch client website pages
`WebSearch`	0	Not used — press/news searches removed April 2026
`Bash` (search_knowledge.js)	0–1	Optional RAG lookup if ragContext was insufficient

Total tool calls target: 5–7. Hard cap via --max-turns 20.

Web searches for press coverage were removed in April 2026. For SMB clients these searches returned no useful results and added 3 wasted tool calls per run.

Cost Profile


Typical tool calls	5–7
Est. cost / task	~$0.12–0.15
Est. duration	~1 min

Before April 2026 tuning: “thoroughly… at minimum” language in the system prompt caused 10+ WebFetch calls (blog posts, sub-pages) + 3 WebSearch calls for press coverage, totalling ~15 tool calls, ~2 min, ~$0.29/run.

Adapter Config

Set in setup.worker.ts for the client-researcher role:


{
  cwd,
  model,                        // from AgentConfig.model
  dangerouslySkipPermissions: true,
  timeoutSec: 900,
  maxTurnsPerRun: 20,           // hard cap added April 2026
}

RAG Pre-fetch

The worker does a server-side RAG pre-fetch before spawning the agent (in setup.worker.ts, not in the agentic loop). It queries the website_content dataset and injects results into the prompt as ragContext. This means:

If the tenant’s website has already been crawled (via the Website Crawler), those results are used immediately
The agent gets structured website content without needing a live WebFetch for pages already indexed
Re-runs after a web crawl are faster and more consistent

Error Handling

Error	Response
`WebFetch` returns 404 or timeout on all pages	Output is partial; Context File Writer still runs with whatever was found
`WebFetch` succeeds on homepage only	Agent continues with partial data; notes gaps in output
Max turns reached (20)	BullMQ job fails; setup chain retries per BullMQ config
RAG pre-fetch returns no results	Agent proceeds without RAG context

Tenant Settings Used

Setting	How it’s used
`tenantName`	Used in job display names
`country`	Passed in prompt to frame geographic context
`plan`	Passed in prompt
`enableWebCrawl`	Passed in job data; controls whether website_content RAG pre-fetch is attempted