Skip to Content
AgentsKeyword Researcher

Keyword Researcher

[Live] · agent__keyword-researcher · Claude Sonnet 4.6

Finds keyword clusters for a given topic — primary keyword, secondary keywords, search intent, volume, difficulty, and LSI terms — to feed the Content Brief Writer and Blog Writer pipeline.


Overview

FunctionDiscover and cluster target keywords for a topic, avoiding cannibalisation of existing rankings
TypeWorker — SEO
ModelClaude Sonnet 4.6
Queueagent__keyword-researcher
Concurrency4
Timeout5 min
Est. cost / task~$0.30
PlanFree+

Triggers

Trigger typeWhenWho initiates
Activity Planner dispatchStart of blog pipeline — enqueued when Activity Planner decomposes a content campaign that includes blog posts or landing pagesActivity Planner
Human on-demandUser clicks “Research keywords” in DM Portal for a specific topic or campaignTenant admin / DM reviewer
Scheduled / cronMonthly — refreshes keyword targets for the upcoming content period, typically run on the 1st of each monthPlatform scheduler

Input

interface KeywordResearcherInput { tenantId: string; topic: string; // e.g. "B2B SaaS onboarding" clientDomain: string; // e.g. "acme.com" targetAudience: string; // e.g. "HR managers at mid-market SaaS companies" existingRankedKeywords: ExistingKeyword[]; // from GSC — used to avoid cannibalisation contentGoal?: 'new_blog_post' | 'optimise_existing' | 'landing_page'; campaignId?: string; } interface ExistingKeyword { keyword: string; position: number; // current average position in GSC url: string; // the page already ranking for this keyword }

Output

The agent outputs strict JSON — no markdown fences, no explanation. After the worker saves outputPayload, it also parses the JSON and creates structured DB rows.

JSON format

{ "groups": [ { "name": "Core Service Keywords", "purpose": "content", "keywords": [ { "keyword": "digital marketing agency", "searchVolume": "10K-100K", "difficulty": "high", "intent": "commercial", "category": "core" } ] } ] }

DB rows created by the worker

Keyword — one row per keyword across all groups:

FieldTypeNotes
keywordstringExact search term
searchVolumestringRange estimate e.g. "1K-10K"
difficultyenum"low" / "medium" / "high"
intentenum"informational" / "commercial" / "transactional" / "navigational"
categoryenum"branded" / "core" / "long_tail" / "local" / "competitor_gap"
sourcestringAlways "agent"
statusstringAlways "active"

KeywordGroup — one row per group:

FieldTypeNotes
namestringGroup label from JSON
purposeenum"research" / "content" / "hot" / "seo_onpage" / "google_ads" / "custom"
activityIdstringLinks to the parent Activity
statusstringAlways "pending_review"

KeywordGroupItem — join table linking groups to keywords:

FieldNotes
isPrimarytrue for the first keyword in each group; false for all others

Status flow

After the worker runs, the Deliverable moves to pending_review. The DM must approve each keyword group individually in the DM portal at /seo/keyword-groups before the cluster is used downstream.


How It Works

  1. Load client context. The Client Context File is injected into the system prompt. Tenant settings provide industry, targetAudience, and connectedChannels. The existingRankedKeywords input is pre-loaded from Google Search Console via the scheduled GSC sync.

  2. RAG: existing rankings and topic coverage. Query Website Content for pages related to the topic to identify what the site already covers. Query Competitor Research for competitor keywords and content in the topic area to identify gaps and proven demand.

  3. Primary keyword discovery. Call semrush_keyword_overview with the raw topic string and the top 2–3 seed variants. Extract volume, difficulty, intent, CPC, and SERP features. Select the best primary keyword based on a difficulty–volume ratio appropriate to the client’s domain rating.

  4. Keyword expansion. Call semrush_keyword_magic_tool with the chosen primary keyword as seed. Request 50 suggestions filtered by: volume ≥ 100/mo, difficulty ≤ 60, same or related intent. Cluster suggestions by semantic similarity.

  5. Cannibalisation check. Cross-reference every candidate keyword against the existingRankedKeywords input. Any keyword where the client already holds a position 1–20 on a different URL is flagged as a cannibalisation risk. GSC rankings are fetched via google_search_console.getKeywordRankings for the client domain.

  6. Select final cluster. From the expanded list, select 5–8 secondary keywords that: (a) are not already ranked on a different page, (b) represent distinct facets of the topic (volume/difficulty/intent variety), (c) include at least one commercial-intent keyword.

  7. Produce output. Write the full KeywordResearcherOutput JSON. The contentGapOpportunity field summarises the single most actionable gap found. The serpFeatures field captures rich result opportunities for the Content Brief Writer to exploit.


System Prompt

You are the Keyword Researcher agent for Leadmetrics, a digital marketing agency AI platform. Your task is to build comprehensive keyword groups for the client based on their business, industry, and competitive landscape. RESEARCH REQUIREMENTS: 1. Use web search to find high-value keywords in the client's niche. 2. Organise keywords into named groups. Each group must have one of these purpose values: - "research" — general keyword intelligence and discovery - "content" — blog post and content marketing targets - "hot" — trending or time-sensitive keywords to act on immediately - "seo_onpage" — on-page SEO optimisation targets - "google_ads" — paid search / PPC campaign keywords - "custom" — any other specific use case 3. For each group, include 8–15 keywords. 4. For each keyword, provide: - keyword: the exact search term - searchVolume: rough estimate string (e.g. "100-1K", "1K-10K", "10K-100K") - difficulty: "low" | "medium" | "high" - intent: "informational" | "commercial" | "transactional" | "navigational" - category: "branded" | "core" | "long_tail" | "local" | "competitor_gap" 5. The first keyword in each group is treated as the primary keyword. OUTPUT FORMAT: Return ONLY a single valid JSON object — no markdown, no code fences, no explanation: { "groups": [ { "name": "Group Name", "purpose": "content", "keywords": [ { "keyword": "example keyword", "searchVolume": "1K-10K", "difficulty": "medium", "intent": "informational", "category": "core" } ] } ] }

Skills Injected

Skill filePurpose
client-context-file.mdAlways injected — company, brand, audience, competitors
keyword-research-sop.mdStep-by-step keyword selection methodology, difficulty-to-volume ratio guide, intent classification rules

keyword-research-sop.md — content

# Keyword Research SOP ## Step 1 — Seed Keyword Selection Start with the raw topic. Generate 3–5 seed variants: - Exact phrase (e.g. "saas onboarding") - Question format (e.g. "how to improve saas onboarding") - Modifier variants (e.g. "saas onboarding best practices", "saas onboarding checklist") Run semrush_keyword_overview on all seeds. Keep the one with the best difficulty-to-volume ratio. ## Step 2 — Keyword Difficulty Targets by Domain Rating Match difficulty targets to the client's domain authority: | Domain Rating | Max KD for Primary | Max KD for Secondary | |---|---|---| | 0–20 (new/low-authority) | 25 | 35 | | 21–40 (emerging) | 35 | 50 | | 41–60 (established) | 50 | 65 | | 61+ (high authority) | 70 | No cap | ## Step 3 — Intent Classification - **Informational:** How, what, why, guide, tips, examples — top-of-funnel - **Commercial:** Best, vs, review, comparison, alternatives — mid-funnel - **Transactional:** Buy, pricing, hire, get started, free trial — bottom-of-funnel - **Navigational:** Brand name + feature — brand awareness only, rarely a content opportunity ## Step 4 — Cannibalisation Check A cannibalisation risk exists when: - The client ranks 1–20 for a keyword variation on a DIFFERENT URL than the one being planned - Two pages share the same primary keyword Resolution: Always recommend optimising the existing page over creating new content when a page already ranks in positions 1–15. For positions 16–30, creating new content is acceptable if the existing page is thin (< 600 words) or poorly optimised. ## Step 5 — Cluster Composition Rules A well-formed keyword cluster has: - 1 primary keyword: the highest-volume, best-fit keyword the entire piece targets - 2–3 informational secondary keywords: topic facets and long-tail variants - 1–2 commercial secondary keywords: validates business case for the content - 5–15 LSI terms: semantically related terms that signal topical depth to search engines - SERP features identified: featured snippet format, PAA questions, schema opportunities ## Keyword Volume Thresholds - Below 50/month: only include if CPC > $5 (strong commercial signal) - 50–200/month: good long-tail targets, ideal for new/low-authority domains - 200–1,000/month: primary targets for most clients - 1,000+/month: high-volume targets; check difficulty carefully before committing

RAG Usage

DatasetQuery exampleWhen used
Website Content"existing blog posts pages about [topic]"Step 2 — identify what the site already covers to avoid duplication and flag cannibalisation risks
Competitor Research"competitor keywords rankings [topic area] [industry]"Step 2 — find keywords competitors rank for that the client does not, validating gap opportunities
Published Content"published articles [topic] keyword targets"Step 2 — cross-reference recently published content to avoid creating duplicate targets
Client Documents"keyword strategy target keywords"Optional — check for any client-specified keyword priorities or exclusions

Tools Required

ToolMethodPurposeRequired?
rag_searchsearchQuery tenant knowledge base for existing rankings and competitor keywordsYes
semrush_keyword_overviewGETRetrieve volume, difficulty, CPC, SERP features for seed keywordsYes
semrush_keyword_magic_toolGETExpand seed keyword into a full cluster of related keywordsYes
google_search_consolegetKeywordRankingsPull current GSC rankings for the client domain to support cannibalisation checkYes

HITL Gates

  • Review type: keyword_group_review
  • Risk level: low
  • Trigger: Always — after the worker saves keyword groups, the Deliverable moves to pending_review. Each KeywordGroup starts at status = "pending_review".
  • Reviewer: DM portal — /seo/keyword-groups.
  • Reviewer action: Approve or reject each keyword group individually. Groups move from pending_reviewapproved one at a time. The DM can review, edit, or discard individual groups before approving.
  • Editing: Reviewer can edit keyword data within a group before approving. Edits are saved to the Keyword records directly.

Guardrails

RuleEnforcement
Primary keyword volume must be ≥ 50/monthHard check — if no keyword meets threshold, output a warning and select best available with a note
No cannibalised keywords in the final clusterString-match check against existingRankedKeywords; any match is moved to cannibalisation array and removed from primary/secondary lists
Secondary keyword count must be 5–8Count check; if fewer than 5 after deduplication, agent retries the magic tool with a broader seed
All volume and difficulty figures must come from tool callsAgent is instructed not to hallucinate metrics; output validator checks that every keyword has non-null volume and difficulty values
LSI terms must be ≥ 5 and ≤ 20Count check on output

Tenant Settings Used

SettingHow it’s used
industryScopes RAG queries and SEMrush category filters
targetAudienceInforms intent prioritisation — B2B audiences skew informational/commercial; B2C audiences skew informational/transactional
connectedChannelsIf Google Search Console is not connected, existingRankedKeywords input will be empty; agent notes this in output and skips cannibalisation check
planFree plan limits to 1 keyword research run per campaign; Pro+ allows on-demand and scheduled runs

Cost Profile

Avg input tokens~5,500 (system prompt + client context + RAG results + tool responses)
Avg output tokens~1,200 (keyword cluster JSON)
Est. cost / task~$0.30

Error Handling

ErrorResponse
SEMrush keyword overview returns no dataRetry with a broader seed keyword; if still empty, flag as “niche topic — limited SEMrush data” and proceed with RAG-only insights
SEMrush magic tool returns < 5 resultsRetry with a different seed variant; if still < 5, include all available results with a note
Google Search Console not connectedSkip cannibalisation check; note “GSC not connected — cannibalisation check skipped” in output
All candidate keywords exceed difficulty targetRelax the difficulty cap by 10 points and retry; if still over threshold, include best available with a note recommending the client build domain authority first
RAG returns no competitor keyword dataProceed without competitor gap analysis; set contentGapOpportunity to “Competitor data unavailable — manual gap analysis recommended”

© 2026 Leadmetrics — Internal use only