Keyword Researcher

[Live] · agent__keyword-researcher · Claude Sonnet 4.6

Finds keyword clusters for a given topic — primary keyword, secondary keywords, search intent, volume, difficulty, and LSI terms — to feed the Content Brief Writer and Blog Writer pipeline.

Overview


Function	Discover and cluster target keywords for a topic, avoiding cannibalisation of existing rankings
Type	Worker — SEO
Model	Claude Sonnet 4.6
Queue	`agent__keyword-researcher`
Concurrency	4
Timeout	5 min
Est. cost / task	~$0.30
Plan	Free+

Triggers

Trigger type	When	Who initiates
Activity Planner dispatch	Start of blog pipeline — enqueued when Activity Planner decomposes a content campaign that includes blog posts or landing pages	Activity Planner
Human on-demand	User clicks “Research keywords” in DM Portal for a specific topic or campaign	Tenant admin / DM reviewer
Scheduled / cron	Monthly — refreshes keyword targets for the upcoming content period, typically run on the 1st of each month	Platform scheduler

Input


interface KeywordResearcherInput {
  tenantId:                 string;
  topic:                    string;           // e.g. "B2B SaaS onboarding"
  clientDomain:             string;           // e.g. "acme.com"
  targetAudience:           string;           // e.g. "HR managers at mid-market SaaS companies"
  existingRankedKeywords:   ExistingKeyword[]; // from GSC — used to avoid cannibalisation
  contentGoal?:             'new_blog_post' | 'optimise_existing' | 'landing_page';
  campaignId?:              string;
}
 
interface ExistingKeyword {
  keyword:  string;
  position: number;   // current average position in GSC
  url:      string;   // the page already ranking for this keyword
}

Output

The agent outputs strict JSON — no markdown fences, no explanation. After the worker saves outputPayload, it also parses the JSON and creates structured DB rows.

JSON format


{
  "groups": [
    {
      "name": "Core Service Keywords",
      "purpose": "content",
      "keywords": [
        {
          "keyword": "digital marketing agency",
          "searchVolume": "10K-100K",
          "difficulty": "high",
          "intent": "commercial",
          "category": "core"
        }
      ]
    }
  ]
}

DB rows created by the worker

Keyword — one row per keyword across all groups:

Field	Type	Notes
`keyword`	string	Exact search term
`searchVolume`	string	Range estimate e.g. `"1K-10K"`
`difficulty`	enum	`"low"` / `"medium"` / `"high"`
`intent`	enum	`"informational"` / `"commercial"` / `"transactional"` / `"navigational"`
`category`	enum	`"branded"` / `"core"` / `"long_tail"` / `"local"` / `"competitor_gap"`
`source`	string	Always `"agent"`
`status`	string	Always `"active"`

KeywordGroup — one row per group:

Field	Type	Notes
`name`	string	Group label from JSON
`purpose`	enum	`"research"` / `"content"` / `"hot"` / `"seo_onpage"` / `"google_ads"` / `"custom"`
`activityId`	string	Links to the parent Activity
`status`	string	Always `"pending_review"`

KeywordGroupItem — join table linking groups to keywords:

Field	Notes
`isPrimary`	`true` for the first keyword in each group; `false` for all others

Status flow

After the worker runs, the Deliverable moves to pending_review. The DM must approve each keyword group individually in the DM portal at /seo/keyword-groups before the cluster is used downstream.

How It Works

Load client context. The Client Context File is injected into the system prompt. Tenant settings provide industry, targetAudience, and connectedChannels. The existingRankedKeywords input is pre-loaded from Google Search Console via the scheduled GSC sync.
RAG: existing rankings and topic coverage. Query Website Content for pages related to the topic to identify what the site already covers. Query Competitor Research for competitor keywords and content in the topic area to identify gaps and proven demand.
Primary keyword discovery. Call semrush_keyword_overview with the raw topic string and the top 2–3 seed variants. Extract volume, difficulty, intent, CPC, and SERP features. Select the best primary keyword based on a difficulty–volume ratio appropriate to the client’s domain rating.
Keyword expansion. Call semrush_keyword_magic_tool with the chosen primary keyword as seed. Request 50 suggestions filtered by: volume ≥ 100/mo, difficulty ≤ 60, same or related intent. Cluster suggestions by semantic similarity.
Cannibalisation check. Cross-reference every candidate keyword against the existingRankedKeywords input. Any keyword where the client already holds a position 1–20 on a different URL is flagged as a cannibalisation risk. GSC rankings are fetched via google_search_console.getKeywordRankings for the client domain.
Select final cluster. From the expanded list, select 5–8 secondary keywords that: (a) are not already ranked on a different page, (b) represent distinct facets of the topic (volume/difficulty/intent variety), (c) include at least one commercial-intent keyword.
Produce output. Write the full KeywordResearcherOutput JSON. The contentGapOpportunity field summarises the single most actionable gap found. The serpFeatures field captures rich result opportunities for the Content Brief Writer to exploit.

System Prompt


You are the Keyword Researcher agent for Leadmetrics, a digital marketing agency AI platform.
Your task is to build comprehensive keyword groups for the client based on their business, industry, and competitive landscape.

RESEARCH REQUIREMENTS:
1. Use web search to find high-value keywords in the client's niche.
2. Organise keywords into named groups. Each group must have one of these purpose values:
   - "research"    — general keyword intelligence and discovery
   - "content"     — blog post and content marketing targets
   - "hot"         — trending or time-sensitive keywords to act on immediately
   - "seo_onpage"  — on-page SEO optimisation targets
   - "google_ads"  — paid search / PPC campaign keywords
   - "custom"      — any other specific use case
3. For each group, include 8–15 keywords.
4. For each keyword, provide:
   - keyword: the exact search term
   - searchVolume: rough estimate string (e.g. "100-1K", "1K-10K", "10K-100K")
   - difficulty: "low" | "medium" | "high"
   - intent: "informational" | "commercial" | "transactional" | "navigational"
   - category: "branded" | "core" | "long_tail" | "local" | "competitor_gap"
5. The first keyword in each group is treated as the primary keyword.

OUTPUT FORMAT:
Return ONLY a single valid JSON object — no markdown, no code fences, no explanation:
{
  "groups": [
    {
      "name": "Group Name",
      "purpose": "content",
      "keywords": [
        {
          "keyword": "example keyword",
          "searchVolume": "1K-10K",
          "difficulty": "medium",
          "intent": "informational",
          "category": "core"
        }
      ]
    }
  ]
}

Skills Injected

Skill file	Purpose
`client-context-file.md`	Always injected — company, brand, audience, competitors
`keyword-research-sop.md`	Step-by-step keyword selection methodology, difficulty-to-volume ratio guide, intent classification rules

`keyword-research-sop.md` — content


# Keyword Research SOP
 
## Step 1 — Seed Keyword Selection
Start with the raw topic. Generate 3–5 seed variants:
- Exact phrase (e.g. "saas onboarding")
- Question format (e.g. "how to improve saas onboarding")
- Modifier variants (e.g. "saas onboarding best practices", "saas onboarding checklist")
Run semrush_keyword_overview on all seeds. Keep the one with the best difficulty-to-volume ratio.
 
## Step 2 — Keyword Difficulty Targets by Domain Rating
Match difficulty targets to the client's domain authority:
| Domain Rating | Max KD for Primary | Max KD for Secondary |
|---|---|---|
| 0–20 (new/low-authority) | 25 | 35 |
| 21–40 (emerging) | 35 | 50 |
| 41–60 (established) | 50 | 65 |
| 61+ (high authority) | 70 | No cap |
 
## Step 3 — Intent Classification
- **Informational:** How, what, why, guide, tips, examples — top-of-funnel
- **Commercial:** Best, vs, review, comparison, alternatives — mid-funnel
- **Transactional:** Buy, pricing, hire, get started, free trial — bottom-of-funnel
- **Navigational:** Brand name + feature — brand awareness only, rarely a content opportunity
 
## Step 4 — Cannibalisation Check
A cannibalisation risk exists when:
- The client ranks 1–20 for a keyword variation on a DIFFERENT URL than the one being planned
- Two pages share the same primary keyword
Resolution: Always recommend optimising the existing page over creating new content when a page
already ranks in positions 1–15. For positions 16–30, creating new content is acceptable if the
existing page is thin (< 600 words) or poorly optimised.
 
## Step 5 — Cluster Composition Rules
A well-formed keyword cluster has:
- 1 primary keyword: the highest-volume, best-fit keyword the entire piece targets
- 2–3 informational secondary keywords: topic facets and long-tail variants
- 1–2 commercial secondary keywords: validates business case for the content
- 5–15 LSI terms: semantically related terms that signal topical depth to search engines
- SERP features identified: featured snippet format, PAA questions, schema opportunities
 
## Keyword Volume Thresholds
- Below 50/month: only include if CPC > $5 (strong commercial signal)
- 50–200/month: good long-tail targets, ideal for new/low-authority domains
- 200–1,000/month: primary targets for most clients
- 1,000+/month: high-volume targets; check difficulty carefully before committing

RAG Usage

Dataset	Query example	When used
Website Content	`"existing blog posts pages about [topic]"`	Step 2 — identify what the site already covers to avoid duplication and flag cannibalisation risks
Competitor Research	`"competitor keywords rankings [topic area] [industry]"`	Step 2 — find keywords competitors rank for that the client does not, validating gap opportunities
Published Content	`"published articles [topic] keyword targets"`	Step 2 — cross-reference recently published content to avoid creating duplicate targets
Client Documents	`"keyword strategy target keywords"`	Optional — check for any client-specified keyword priorities or exclusions

Tools Required

Tool	Method	Purpose	Required?
`rag_search`	search	Query tenant knowledge base for existing rankings and competitor keywords	Yes
`semrush_keyword_overview`	GET	Retrieve volume, difficulty, CPC, SERP features for seed keywords	Yes
`semrush_keyword_magic_tool`	GET	Expand seed keyword into a full cluster of related keywords	Yes
`google_search_console`	`getKeywordRankings`	Pull current GSC rankings for the client domain to support cannibalisation check	Yes

HITL Gates

Review type: keyword_group_review
Risk level: low
Trigger: Always — after the worker saves keyword groups, the Deliverable moves to pending_review. Each KeywordGroup starts at status = "pending_review".
Reviewer: DM portal — /seo/keyword-groups.
Reviewer action: Approve or reject each keyword group individually. Groups move from pending_review → approved one at a time. The DM can review, edit, or discard individual groups before approving.
Editing: Reviewer can edit keyword data within a group before approving. Edits are saved to the Keyword records directly.

Guardrails

Rule	Enforcement
Primary keyword volume must be ≥ 50/month	Hard check — if no keyword meets threshold, output a warning and select best available with a note
No cannibalised keywords in the final cluster	String-match check against existingRankedKeywords; any match is moved to `cannibalisation` array and removed from primary/secondary lists
Secondary keyword count must be 5–8	Count check; if fewer than 5 after deduplication, agent retries the magic tool with a broader seed
All volume and difficulty figures must come from tool calls	Agent is instructed not to hallucinate metrics; output validator checks that every keyword has non-null volume and difficulty values
LSI terms must be ≥ 5 and ≤ 20	Count check on output

Tenant Settings Used

Setting	How it’s used
`industry`	Scopes RAG queries and SEMrush category filters
`targetAudience`	Informs intent prioritisation — B2B audiences skew informational/commercial; B2C audiences skew informational/transactional
`connectedChannels`	If Google Search Console is not connected, `existingRankedKeywords` input will be empty; agent notes this in output and skips cannibalisation check
`plan`	Free plan limits to 1 keyword research run per campaign; Pro+ allows on-demand and scheduled runs

Cost Profile


Avg input tokens	~5,500 (system prompt + client context + RAG results + tool responses)
Avg output tokens	~1,200 (keyword cluster JSON)
Est. cost / task	~$0.30

Error Handling

Error	Response
SEMrush keyword overview returns no data	Retry with a broader seed keyword; if still empty, flag as “niche topic — limited SEMrush data” and proceed with RAG-only insights
SEMrush magic tool returns < 5 results	Retry with a different seed variant; if still < 5, include all available results with a note
Google Search Console not connected	Skip cannibalisation check; note “GSC not connected — cannibalisation check skipped” in output
All candidate keywords exceed difficulty target	Relax the difficulty cap by 10 points and retry; if still over threshold, include best available with a note recommending the client build domain authority first
RAG returns no competitor keyword data	Proceed without competitor gap analysis; set `contentGapOpportunity` to “Competitor data unavailable — manual gap analysis recommended”