Client Researcher
[Live] ·
agent__client-researcher· Claude Sonnet 4.6
Researches the client’s website and produces structured Markdown research notes that seed the Context File Writer and the entire downstream agent pipeline.
Overview
| Function | Fetch up to 5 pages from the client’s website and write structured research notes |
| Type | Setup (part of the client context pipeline) |
| Model | Claude Sonnet 4.6 |
| Queue | agent__client-researcher |
| Concurrency | 3 |
| Timeout | 15 min (timeoutSec: 900) |
| Max turns | 20 (hard cap to prevent runaway page fetching) |
| Est. cost / task | ~$0.12–0.15 |
| Plan | Free+ (all tenants) |
Triggers
| Trigger type | When | Who initiates |
|---|---|---|
| Auto (setup chain) | On tenant onboarding completion — first agent in the setup chain | Platform (triggered by completeOnboarding()) |
| Human on-demand | ”Refresh Context” revision flow in Dashboard → Client Context | Tenant admin |
Input
interface SetupJobData {
tenantId: string;
tenantName: string;
country: string;
plan: string;
revisionNotes?: string; // present on revision runs; focuses the research
wakeReason: WakeReason;
enableWebCrawl: boolean;
ragContext?: string; // pre-fetched website_content RAG results (server-side pre-fetch)
}The worker does a server-side RAG pre-fetch against the website_content dataset before spawning the agent, and injects results into the prompt via ragContext. This means the agent gets any already-crawled website content without needing to call search_knowledge.js itself.
Output
The agent writes structured Markdown (not JSON). Output is passed as clientResearchOutput to the Competitor Researcher and eventually the Context File Writer.
Structure: clearly labelled sections covering products/services, target audience, brand voice, content presence, geographic focus, social profiles, and any technical notes.
How It Works
-
Receive job — BullMQ dequeues the job; worker builds the prompt using
buildClientResearcherPrompt()(exported fromsetup.worker.ts) with tenantName, country, plan, optional ragContext, and optional revisionNotes. -
Skills dir created —
createSkillsDir()writessearch_knowledge.js+CLAUDE.md+ any DB-mapped skill files into a temp directory passed to the Claude Code CLI via--add-dir. -
Agentic loop — Claude Code CLI runs with
--dangerously-skip-permissionsand a cap of 20 turns. The agent:- Fetches at most 5 pages: homepage, services/products, about, contact, and optionally the blog index
- Does not follow links to individual blog posts or case studies
- Does not run web searches for news or press coverage — only uses what is visible on the fetched pages
- Notes any awards, press mentions, or milestones visible in the fetched content
-
Chain forward — On success, the worker loads any existing competitors from the
Competitortable and enqueues acompetitor-researcherjob (or jumps tocontext-file-writerifenableCompetitorResearch = false). -
DB log — A
ClientContextLogrecord is written withaction: "research_completed".
System Prompt (live in AgentConfig table)
You are the Client Researcher agent for Leadmetrics, a digital marketing agency AI platform.
Your task is to research a client's business and produce detailed research notes that will be used to generate their marketing context file.
RESEARCH REQUIREMENTS:
1. Fetch at most 5 pages from the client's website: prioritise homepage, services/products, about, and contact. If a blog index is present, fetch it too — but do not follow individual blog post links.
2. Identify and document:
- Core products or services offered
- Unique selling propositions (USPs) and differentiators
- Target audience signals (language used, testimonials, case studies)
- Geographic focus (local, national, global)
- Brand tone and voice
- Current content types and publishing frequency
- Any pricing tiers or packages visible
- Social media presence (platforms linked from site)
- Technical indicators (e-commerce, booking system, lead gen forms)
- Any awards, press mentions, or notable milestones visible on the pages you fetch
3. Identify the primary industry vertical and sub-niche.
4. Note any obvious content gaps or marketing weaknesses observed on the site.
DO NOT:
- Fetch more than 5 pages total
- Follow links to individual blog posts or case studies
- Run web searches for news or press coverage — only use what is visible on the website
OUTPUT FORMAT:
Write structured Markdown with clearly labelled sections. Be factual — only include information you can verify from the website. Do not speculate or invent details.
Output ONLY the research notes. No preamble, no explanation.The system prompt is stored in AgentConfig.systemPrompt (role client-researcher) and can be updated via Manage → Agents without a redeploy. The seed in packages/db/src/seed.ts mirrors this value.
Skills Injected
| Skill file | Purpose |
|---|---|
search_knowledge.js | Query the tenant RAG knowledge base (all datasets) |
CLAUDE.md | Instructions for when/how to call search_knowledge |
Tools Used (actual)
| Tool | Calls per run | Purpose |
|---|---|---|
WebFetch | max 5 | Fetch client website pages |
WebSearch | 0 | Not used — press/news searches removed April 2026 |
Bash (search_knowledge.js) | 0–1 | Optional RAG lookup if ragContext was insufficient |
Total tool calls target: 5–7. Hard cap via --max-turns 20.
Web searches for press coverage were removed in April 2026. For SMB clients these searches returned no useful results and added 3 wasted tool calls per run.
Cost Profile
| Typical tool calls | 5–7 |
| Est. cost / task | ~$0.12–0.15 |
| Est. duration | ~1 min |
Before April 2026 tuning: “thoroughly… at minimum” language in the system prompt caused 10+ WebFetch calls (blog posts, sub-pages) + 3 WebSearch calls for press coverage, totalling ~15 tool calls, ~2 min, ~$0.29/run.
Adapter Config
Set in setup.worker.ts for the client-researcher role:
{
cwd,
model, // from AgentConfig.model
dangerouslySkipPermissions: true,
timeoutSec: 900,
maxTurnsPerRun: 20, // hard cap added April 2026
}RAG Pre-fetch
The worker does a server-side RAG pre-fetch before spawning the agent (in setup.worker.ts, not in the agentic loop). It queries the website_content dataset and injects results into the prompt as ragContext. This means:
- If the tenant’s website has already been crawled (via the Website Crawler), those results are used immediately
- The agent gets structured website content without needing a live
WebFetchfor pages already indexed - Re-runs after a web crawl are faster and more consistent
Error Handling
| Error | Response |
|---|---|
WebFetch returns 404 or timeout on all pages | Output is partial; Context File Writer still runs with whatever was found |
WebFetch succeeds on homepage only | Agent continues with partial data; notes gaps in output |
| Max turns reached (20) | BullMQ job fails; setup chain retries per BullMQ config |
| RAG pre-fetch returns no results | Agent proceeds without RAG context |
Tenant Settings Used
| Setting | How it’s used |
|---|---|
tenantName | Used in job display names |
country | Passed in prompt to frame geographic context |
plan | Passed in prompt |
enableWebCrawl | Passed in job data; controls whether website_content RAG pre-fetch is attempted |