Keyword Researcher
[Live] ·
agent__keyword-researcher· Claude Sonnet 4.6
Finds keyword clusters for a given topic — primary keyword, secondary keywords, search intent, volume, difficulty, and LSI terms — to feed the Content Brief Writer and Blog Writer pipeline.
Overview
| Function | Discover and cluster target keywords for a topic, avoiding cannibalisation of existing rankings |
| Type | Worker — SEO |
| Model | Claude Sonnet 4.6 |
| Queue | agent__keyword-researcher |
| Concurrency | 4 |
| Timeout | 5 min |
| Est. cost / task | ~$0.30 |
| Plan | Free+ |
Triggers
| Trigger type | When | Who initiates |
|---|---|---|
| Activity Planner dispatch | Start of blog pipeline — enqueued when Activity Planner decomposes a content campaign that includes blog posts or landing pages | Activity Planner |
| Human on-demand | User clicks “Research keywords” in DM Portal for a specific topic or campaign | Tenant admin / DM reviewer |
| Scheduled / cron | Monthly — refreshes keyword targets for the upcoming content period, typically run on the 1st of each month | Platform scheduler |
Input
interface KeywordResearcherInput {
tenantId: string;
topic: string; // e.g. "B2B SaaS onboarding"
clientDomain: string; // e.g. "acme.com"
targetAudience: string; // e.g. "HR managers at mid-market SaaS companies"
existingRankedKeywords: ExistingKeyword[]; // from GSC — used to avoid cannibalisation
contentGoal?: 'new_blog_post' | 'optimise_existing' | 'landing_page';
campaignId?: string;
}
interface ExistingKeyword {
keyword: string;
position: number; // current average position in GSC
url: string; // the page already ranking for this keyword
}Output
The agent outputs strict JSON — no markdown fences, no explanation. After the worker saves outputPayload, it also parses the JSON and creates structured DB rows.
JSON format
{
"groups": [
{
"name": "Core Service Keywords",
"purpose": "content",
"keywords": [
{
"keyword": "digital marketing agency",
"searchVolume": "10K-100K",
"difficulty": "high",
"intent": "commercial",
"category": "core"
}
]
}
]
}DB rows created by the worker
Keyword — one row per keyword across all groups:
| Field | Type | Notes |
|---|---|---|
keyword | string | Exact search term |
searchVolume | string | Range estimate e.g. "1K-10K" |
difficulty | enum | "low" / "medium" / "high" |
intent | enum | "informational" / "commercial" / "transactional" / "navigational" |
category | enum | "branded" / "core" / "long_tail" / "local" / "competitor_gap" |
source | string | Always "agent" |
status | string | Always "active" |
KeywordGroup — one row per group:
| Field | Type | Notes |
|---|---|---|
name | string | Group label from JSON |
purpose | enum | "research" / "content" / "hot" / "seo_onpage" / "google_ads" / "custom" |
activityId | string | Links to the parent Activity |
status | string | Always "pending_review" |
KeywordGroupItem — join table linking groups to keywords:
| Field | Notes |
|---|---|
isPrimary | true for the first keyword in each group; false for all others |
Status flow
After the worker runs, the Deliverable moves to pending_review. The DM must approve each keyword group individually in the DM portal at /seo/keyword-groups before the cluster is used downstream.
How It Works
-
Load client context. The Client Context File is injected into the system prompt. Tenant settings provide
industry,targetAudience, andconnectedChannels. TheexistingRankedKeywordsinput is pre-loaded from Google Search Console via the scheduled GSC sync. -
RAG: existing rankings and topic coverage. Query Website Content for pages related to the topic to identify what the site already covers. Query Competitor Research for competitor keywords and content in the topic area to identify gaps and proven demand.
-
Primary keyword discovery. Call
semrush_keyword_overviewwith the raw topic string and the top 2–3 seed variants. Extract volume, difficulty, intent, CPC, and SERP features. Select the best primary keyword based on a difficulty–volume ratio appropriate to the client’s domain rating. -
Keyword expansion. Call
semrush_keyword_magic_toolwith the chosen primary keyword as seed. Request 50 suggestions filtered by: volume ≥ 100/mo, difficulty ≤ 60, same or related intent. Cluster suggestions by semantic similarity. -
Cannibalisation check. Cross-reference every candidate keyword against the
existingRankedKeywordsinput. Any keyword where the client already holds a position 1–20 on a different URL is flagged as a cannibalisation risk. GSC rankings are fetched viagoogle_search_console.getKeywordRankingsfor the client domain. -
Select final cluster. From the expanded list, select 5–8 secondary keywords that: (a) are not already ranked on a different page, (b) represent distinct facets of the topic (volume/difficulty/intent variety), (c) include at least one commercial-intent keyword.
-
Produce output. Write the full
KeywordResearcherOutputJSON. ThecontentGapOpportunityfield summarises the single most actionable gap found. TheserpFeaturesfield captures rich result opportunities for the Content Brief Writer to exploit.
System Prompt
You are the Keyword Researcher agent for Leadmetrics, a digital marketing agency AI platform.
Your task is to build comprehensive keyword groups for the client based on their business, industry, and competitive landscape.
RESEARCH REQUIREMENTS:
1. Use web search to find high-value keywords in the client's niche.
2. Organise keywords into named groups. Each group must have one of these purpose values:
- "research" — general keyword intelligence and discovery
- "content" — blog post and content marketing targets
- "hot" — trending or time-sensitive keywords to act on immediately
- "seo_onpage" — on-page SEO optimisation targets
- "google_ads" — paid search / PPC campaign keywords
- "custom" — any other specific use case
3. For each group, include 8–15 keywords.
4. For each keyword, provide:
- keyword: the exact search term
- searchVolume: rough estimate string (e.g. "100-1K", "1K-10K", "10K-100K")
- difficulty: "low" | "medium" | "high"
- intent: "informational" | "commercial" | "transactional" | "navigational"
- category: "branded" | "core" | "long_tail" | "local" | "competitor_gap"
5. The first keyword in each group is treated as the primary keyword.
OUTPUT FORMAT:
Return ONLY a single valid JSON object — no markdown, no code fences, no explanation:
{
"groups": [
{
"name": "Group Name",
"purpose": "content",
"keywords": [
{
"keyword": "example keyword",
"searchVolume": "1K-10K",
"difficulty": "medium",
"intent": "informational",
"category": "core"
}
]
}
]
}Skills Injected
| Skill file | Purpose |
|---|---|
client-context-file.md | Always injected — company, brand, audience, competitors |
keyword-research-sop.md | Step-by-step keyword selection methodology, difficulty-to-volume ratio guide, intent classification rules |
keyword-research-sop.md — content
# Keyword Research SOP
## Step 1 — Seed Keyword Selection
Start with the raw topic. Generate 3–5 seed variants:
- Exact phrase (e.g. "saas onboarding")
- Question format (e.g. "how to improve saas onboarding")
- Modifier variants (e.g. "saas onboarding best practices", "saas onboarding checklist")
Run semrush_keyword_overview on all seeds. Keep the one with the best difficulty-to-volume ratio.
## Step 2 — Keyword Difficulty Targets by Domain Rating
Match difficulty targets to the client's domain authority:
| Domain Rating | Max KD for Primary | Max KD for Secondary |
|---|---|---|
| 0–20 (new/low-authority) | 25 | 35 |
| 21–40 (emerging) | 35 | 50 |
| 41–60 (established) | 50 | 65 |
| 61+ (high authority) | 70 | No cap |
## Step 3 — Intent Classification
- **Informational:** How, what, why, guide, tips, examples — top-of-funnel
- **Commercial:** Best, vs, review, comparison, alternatives — mid-funnel
- **Transactional:** Buy, pricing, hire, get started, free trial — bottom-of-funnel
- **Navigational:** Brand name + feature — brand awareness only, rarely a content opportunity
## Step 4 — Cannibalisation Check
A cannibalisation risk exists when:
- The client ranks 1–20 for a keyword variation on a DIFFERENT URL than the one being planned
- Two pages share the same primary keyword
Resolution: Always recommend optimising the existing page over creating new content when a page
already ranks in positions 1–15. For positions 16–30, creating new content is acceptable if the
existing page is thin (< 600 words) or poorly optimised.
## Step 5 — Cluster Composition Rules
A well-formed keyword cluster has:
- 1 primary keyword: the highest-volume, best-fit keyword the entire piece targets
- 2–3 informational secondary keywords: topic facets and long-tail variants
- 1–2 commercial secondary keywords: validates business case for the content
- 5–15 LSI terms: semantically related terms that signal topical depth to search engines
- SERP features identified: featured snippet format, PAA questions, schema opportunities
## Keyword Volume Thresholds
- Below 50/month: only include if CPC > $5 (strong commercial signal)
- 50–200/month: good long-tail targets, ideal for new/low-authority domains
- 200–1,000/month: primary targets for most clients
- 1,000+/month: high-volume targets; check difficulty carefully before committingRAG Usage
| Dataset | Query example | When used |
|---|---|---|
| Website Content | "existing blog posts pages about [topic]" | Step 2 — identify what the site already covers to avoid duplication and flag cannibalisation risks |
| Competitor Research | "competitor keywords rankings [topic area] [industry]" | Step 2 — find keywords competitors rank for that the client does not, validating gap opportunities |
| Published Content | "published articles [topic] keyword targets" | Step 2 — cross-reference recently published content to avoid creating duplicate targets |
| Client Documents | "keyword strategy target keywords" | Optional — check for any client-specified keyword priorities or exclusions |
Tools Required
| Tool | Method | Purpose | Required? |
|---|---|---|---|
rag_search | search | Query tenant knowledge base for existing rankings and competitor keywords | Yes |
semrush_keyword_overview | GET | Retrieve volume, difficulty, CPC, SERP features for seed keywords | Yes |
semrush_keyword_magic_tool | GET | Expand seed keyword into a full cluster of related keywords | Yes |
google_search_console | getKeywordRankings | Pull current GSC rankings for the client domain to support cannibalisation check | Yes |
HITL Gates
- Review type:
keyword_group_review - Risk level:
low - Trigger: Always — after the worker saves keyword groups, the Deliverable moves to
pending_review. EachKeywordGroupstarts atstatus = "pending_review". - Reviewer: DM portal —
/seo/keyword-groups. - Reviewer action: Approve or reject each keyword group individually. Groups move from
pending_review→approvedone at a time. The DM can review, edit, or discard individual groups before approving. - Editing: Reviewer can edit keyword data within a group before approving. Edits are saved to the
Keywordrecords directly.
Guardrails
| Rule | Enforcement |
|---|---|
| Primary keyword volume must be ≥ 50/month | Hard check — if no keyword meets threshold, output a warning and select best available with a note |
| No cannibalised keywords in the final cluster | String-match check against existingRankedKeywords; any match is moved to cannibalisation array and removed from primary/secondary lists |
| Secondary keyword count must be 5–8 | Count check; if fewer than 5 after deduplication, agent retries the magic tool with a broader seed |
| All volume and difficulty figures must come from tool calls | Agent is instructed not to hallucinate metrics; output validator checks that every keyword has non-null volume and difficulty values |
| LSI terms must be ≥ 5 and ≤ 20 | Count check on output |
Tenant Settings Used
| Setting | How it’s used |
|---|---|
industry | Scopes RAG queries and SEMrush category filters |
targetAudience | Informs intent prioritisation — B2B audiences skew informational/commercial; B2C audiences skew informational/transactional |
connectedChannels | If Google Search Console is not connected, existingRankedKeywords input will be empty; agent notes this in output and skips cannibalisation check |
plan | Free plan limits to 1 keyword research run per campaign; Pro+ allows on-demand and scheduled runs |
Cost Profile
| Avg input tokens | ~5,500 (system prompt + client context + RAG results + tool responses) |
| Avg output tokens | ~1,200 (keyword cluster JSON) |
| Est. cost / task | ~$0.30 |
Error Handling
| Error | Response |
|---|---|
| SEMrush keyword overview returns no data | Retry with a broader seed keyword; if still empty, flag as “niche topic — limited SEMrush data” and proceed with RAG-only insights |
| SEMrush magic tool returns < 5 results | Retry with a different seed variant; if still < 5, include all available results with a note |
| Google Search Console not connected | Skip cannibalisation check; note “GSC not connected — cannibalisation check skipped” in output |
| All candidate keywords exceed difficulty target | Relax the difficulty cap by 10 points and retry; if still over threshold, include best available with a note recommending the client build domain authority first |
| RAG returns no competitor keyword data | Proceed without competitor gap analysis; set contentGapOpportunity to “Competitor data unavailable — manual gap analysis recommended” |