Site Auditor

[Live] · agent__site-auditor · Claude Sonnet 4.6

Runs a technical SEO audit of the client’s website, cross-references crawl data with Search Console coverage, and produces a prioritised findings report with an actionable remediation list.

Overview


Function	Audit a client domain for technical SEO issues and produce a prioritised remediation report
Type	Worker — SEO
Model	Claude Sonnet 4.6
Queue	`agent__site-auditor`
Concurrency	2
Timeout	10 min
Est. cost / task	~$0.80
Plan	Pro+

Triggers

Trigger type	When	Who initiates
Activity Planner dispatch	Monthly — Strategist enqueues a site audit job at the start of the monthly SEO pipeline, before any content work begins	Activity Planner
Human on-demand	User clicks “Run audit” in Dashboard or DM Portal — e.g. after a site migration, major content publish, or when a client reports a traffic drop	Tenant admin / DM reviewer
Scheduled / cron	Monthly cron on the 1st of the month — runs independently of the Activity Planner pipeline to ensure audit data is always fresh for the month’s reporting	Platform scheduler

Input


interface SiteAuditorInput {
  tenantId:          string;
  clientDomain:      string;           // e.g. "acme.com" — no protocol, no trailing slash
  previousAuditDate?: string;          // ISO date — used to surface what has changed since last audit
  focusAreas?:       FocusArea[];      // optional — narrows scope; if omitted, all areas are audited
  campaignId?:       string;
}
 
type FocusArea =
  | 'crawlability'
  | 'on-page'
  | 'speed'
  | 'structured-data'
  | 'mobile';

Output


interface SiteAuditorOutput {
  tenantId:            string;
  clientDomain:        string;
  auditDate:           string;          // ISO timestamp
  previousAuditDate?:  string;
  technicalScore:      number;          // 0–100 composite score
  scoreDelta?:         number;          // change vs. previous audit (positive = improved)
  summary:             string;          // 3–5 sentence executive summary
  criticalIssues:      AuditFinding[];
  warnings:            AuditFinding[];
  opportunities:       AuditFinding[];
  prioritisedActionList: ActionItem[];
  focusAreaScores:     FocusAreaScore[];
}
 
interface AuditFinding {
  id:              string;
  category:        FocusArea | 'security' | 'links' | 'content';
  title:           string;
  description:     string;
  affectedUrls:    string[];          // up to 5 example URLs
  affectedCount:   number;            // total pages/items affected
  impact:          'critical' | 'high' | 'medium' | 'low';
  effort:          'low' | 'medium' | 'high';
  recommendation:  string;            // specific fix instruction
  changedSinceLast?: boolean;         // true if this issue is new or worsened since previous audit
}
 
interface ActionItem {
  priority:       number;             // 1 = highest
  title:          string;
  rationale:      string;             // why this is ranked at this priority
  estimatedImpact: string;            // e.g. "Fix 42 pages with missing H1 — likely +3–5% organic CTR"
  effort:         'low' | 'medium' | 'high';
  owner:          'dev' | 'content' | 'dm-agency';
}
 
interface FocusAreaScore {
  area:    FocusArea | 'security' | 'links' | 'content';
  score:   number;   // 0–100
  delta?:  number;   // change vs previous audit
}

Sample output excerpt


## Technical SEO Audit — acme.com
**Audit Date:** 2026-03-01 | **Previous Audit:** 2026-02-01
**Technical Score:** 68/100 (▲ +4 vs. last month)
 
---
 
### Executive Summary
acme.com has improved its crawlability score following the February canonical tag fixes (+8 points).
The primary outstanding issues are 34 pages returning 3xx redirects in chains of 2+ hops, which
waste crawl budget and dilute link equity. Core Web Vitals remain a concern — LCP averages 4.2s on
mobile against a Good threshold of 2.5s, driven by uncompressed hero images on product pages.
Structured data coverage is strong (87% of blog posts have Article schema), but 12 product pages
are missing Product schema entirely. This month's priority is redirect chain resolution — a dev
task estimated at 2–3 hours.
 
---
 
### Critical Issues (2)
 
**1. Redirect chains (2+ hops) — 34 URLs**
- Category: Crawlability | Impact: Critical | Effort: Low (dev)
- Affected examples: /old-pricing → /pricing-2024 → /pricing (3 hops)
- Recommendation: Update all internal links pointing to intermediate URLs to point directly
  to the final destination. Update the sitemap to use final URLs only.
- ⚠️ New since last audit — 34 URLs have been added to chains since 2026-02-01
 
**2. LCP > 4s on mobile — 18 product pages**
- Category: Speed | Impact: Critical | Effort: Medium (dev)
- Average LCP: 4.2s | Good threshold: 2.5s
- Recommendation: Compress and serve hero images in WebP format; implement lazy loading for
  below-the-fold images; consider preloading the LCP element.
 
---
 
### Prioritised Action List
1. [Dev] Fix redirect chains — 34 URLs, 2–3 hours effort, high crawlability impact
2. [Dev] Compress hero images to WebP — 18 product pages, 4–6 hours, Core Web Vitals impact
3. [Content] Add Product schema to 12 product pages — 2 hours, structured data + rich result eligibility
4. [DM Agency] Submit updated sitemap to GSC after redirect fixes — 15 mins

How It Works

Load client context. The Client Context File is injected. Tenant settings provide clientDomain, industry, and plan. If previousAuditDate is provided, the previous audit report is retrieved from the DB for delta calculations.
RAG: cross-reference known site structure. Query Website Content for the client’s crawled pages, site structure, and any known page issues. Query Client Documents for developer roadmap notes, known technical issues the client has flagged, or planned migrations — this provides context so the audit doesn’t recommend work that’s already in progress.
SEMrush site crawl. Call semrush_site_audit to initiate or retrieve the latest crawl for clientDomain. Extract: crawl errors, broken links, redirect chains, duplicate content, missing meta tags, thin content pages, missing canonical tags, and Core Web Vitals estimates. This is the primary data source for the audit.
Google Search Console: crawl and index data. Call google_search_console.getCrawlErrors for server errors and not-found URLs. Call google_search_console.getIndexCoverage for indexed vs. submitted vs. excluded pages. Cross-reference with SEMrush crawl — pages SEMrush finds but GSC excludes are flagged for investigation.
Spot-check key pages. Call web_fetch on the homepage, a sample product/service page, and the highest-traffic blog post (from GSC data if available). Check: title tag, meta description, H1, canonical tag, structured data presence (JSON-LD), robots meta, and page load signals. This catches issues the automated crawl may miss.
Classify and prioritise findings. Categorise every finding by area and impact. Apply the prioritisation matrix: Critical issues with Low effort are ranked first. Within the same impact level, issues affecting more pages rank higher. Issues that are new or worsened since the previous audit are flagged with changedSinceLast: true.
Calculate scores. Technical score is a weighted composite: Crawlability 30%, On-page 25%, Speed 20%, Structured Data 15%, Mobile 10%. Score delta is calculated against the previous audit if available. Focus area scores are calculated individually.
Write the executive summary. 3–5 sentences covering: overall score and trend, the two most important findings, and one key win (if the score improved). Written for a non-technical client to read — avoids jargon.

System Prompt


You are a technical SEO auditor working for a digital marketing agency. Your job is to
analyse crawl data, Search Console coverage, and spot-checks to produce a prioritised
SEO audit report for a client website.

CLIENT CONTEXT:
{{CLIENT_CONTEXT}}

TENANT SETTINGS:
{{TENANT_SETTINGS}}

KNOWLEDGE BASE CONTEXT:
{{RAG_CONTEXT}}

You have been provided with:
- SEMrush site audit results (crawl errors, redirect chains, on-page issues, Core Web Vitals)
- Google Search Console crawl errors and index coverage data
- Spot-check results for key pages (homepage, a product/service page, top blog post)
- The client's known technical context from their document library
- Previous audit data (if available) for delta comparison

Your output must be a complete technical SEO audit report containing:
1. Technical score (0–100) and delta vs. previous audit
2. Executive summary (3–5 sentences, written for a non-technical client)
3. Critical issues — findings that are actively harming rankings or crawlability
4. Warnings — issues that will cause problems if left unaddressed
5. Opportunities — improvements that would increase rankings or click-through rate
6. Prioritised action list — ranked by impact × effort, with owner (dev/content/dm-agency)
7. Focus area scores (crawlability, on-page, speed, structured-data, mobile)

Prioritisation rules:
- Critical impact + Low effort = Priority 1 always
- Critical impact + High effort = Priority 2–3 (important but acknowledge the effort)
- Medium impact + Low effort = Priority 3–4 ("quick wins")
- Low impact regardless of effort = Bottom of the list
- Issues new or worsened since the previous audit = bump one priority level

Do not fabricate findings. Every issue must reference data from the tool results provided.
For each finding, name the specific fix — "improve page speed" is not a recommendation;
"compress hero images to WebP and implement lazy loading" is.

If the client's document library mentions a known issue or planned fix, note it in the
relevant finding's recommendation as "Note: Client aware — fix in progress per [doc reference]."

Output valid JSON matching the SiteAuditorOutput schema.

Skills Injected

Skill file	Purpose
`client-context-file.md`	Always injected — site URL, industry, known site architecture
`technical-seo-standards.md`	Reference for what constitutes critical vs. warning vs. opportunity; scoring methodology; fix recommendations for common issues

`technical-seo-standards.md` — content


# Technical SEO Standards
 
## Severity Classification
 
**Critical** — Issues actively harming crawlability, indexing, or rankings:
- Pages returning 5xx errors
- Redirect chains of 2+ hops affecting > 5 URLs
- Canonical tags pointing to non-canonical pages
- Sitemap containing noindex or 404 URLs
- Duplicate H1 tags site-wide
- Core Web Vitals in "Poor" range on mobile (LCP > 4s, CLS > 0.25, INP > 500ms)
- Pages with hreflang conflicts (multilingual sites)
 
**Warning** — Issues that will compound over time if unaddressed:
- Missing meta descriptions on > 10% of indexed pages
- Thin content (< 300 words) on pages meant to rank
- Broken internal links (404s in internal anchor tags)
- Missing alt text on > 20% of images
- Pages excluded from index without clear rationale (not noindex, not canonical — orphaned)
- Structured data errors (valid schema but with missing recommended fields)
- Redirect chains of exactly 1 hop (inefficient but not critical)
 
**Opportunity** — Improvements that would improve ranking or CTR:
- Pages missing Article, Product, FAQ, or HowTo schema where applicable
- Title tags not using the primary keyword in the first 60 characters
- Meta descriptions missing on ≤ 10% of pages
- Internal link opportunities (pages in the same cluster not linked to each other)
- Featured snippet gaps (queries the site ranks 1–5 for but doesn't hold the snippet)
 
## Technical Score Weighting
| Area | Weight |
|---|---|
| Crawlability | 30% |
| On-page (titles, metas, H1s, canonicals) | 25% |
| Speed (Core Web Vitals) | 20% |
| Structured Data | 15% |
| Mobile | 10% |
 
Score calculation per area: start at 100, subtract points per finding:
- Critical finding: −15 points each (capped at −45)
- Warning: −5 points each (capped at −20)
- Opportunity: −2 points each (capped at −10)
 
## Owner Classification
- **Dev:** Server-side fixes, image optimisation, redirect resolution, schema implementation
- **Content:** Meta descriptions, title tags, H1 copy, thin content expansion, alt text
- **DM Agency:** Sitemap resubmission, GSC disavow, internal linking recommendations
 
## Core Web Vitals Thresholds
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| LCP | < 2.5s | 2.5s–4.0s | > 4.0s |
| INP | < 200ms | 200ms–500ms | > 500ms |
| CLS | < 0.1 | 0.1–0.25 | > 0.25 |

RAG Usage

Dataset	Query example	When used
Website Content	`"site structure crawled pages URL list architecture"`	Step 2 — cross-reference indexed pages vs. audit findings; identify orphaned pages
Client Documents	`"known technical issues dev roadmap planned migrations"`	Step 2 — avoid recommending work already in progress; adds context to findings
Published Content	`"recently published pages blog posts URLs"`	Step 5 — used to select the right page for spot-check
Competitor Research	Not typically queried	Audit is focused on the client domain only

Tools Required

Tool	Method	Purpose	Required?
`rag_search`	search	Query site structure and client documents	Yes
`semrush_site_audit`	GET	Full site crawl — errors, redirects, on-page issues, Core Web Vitals	Yes
`google_search_console`	`getCrawlErrors`	Server errors and 404s reported by Googlebot	Yes
`google_search_console`	`getIndexCoverage`	Indexed vs. submitted vs. excluded page counts	Yes
`web_fetch`	GET	Spot-check key pages for title, meta, H1, canonical, structured data	Yes

HITL Gates

Review type: site_audit_review
Risk level: medium
Trigger: Always — the audit report is presented to the DM reviewer before being shared with the client or used to dispatch remediation tasks.
Reviewer action: Approve the report, edit findings (mark as “in progress” or “client aware”), reorder the action list, or add manual findings not captured by the automated tools. Approved report is stored as the monthly audit deliverable and shared with the client on the next reporting cycle.
Escalation: If critical issues affect site indexing or show a significant score drop (> 10 points), the reviewer is notified immediately via email/Slack rather than in the normal weekly review queue.

Guardrails

Rule	Enforcement
Every finding must reference tool data	Agent is instructed not to fabricate findings; post-generation validator checks that each finding has an `affectedCount` > 0
Technical score must be 0–100	Range check; scores outside range trigger a retry
Prioritised action list must have ≥ 3 items	Count check; if fewer, retry with explicit instruction
Critical issues must not exceed 10	If SEMrush returns > 10 critical findings, they are grouped by category and the top 10 by affected page count are surfaced individually; remainder are summarised
Spot-check URLs must be real fetched pages	`web_fetch` response status must be 200; non-200 responses are noted in the finding

Tenant Settings Used

Setting	How it’s used
`industry`	Informs which structured data schemas are expected (e-commerce → Product; publisher → Article; local business → LocalBusiness)
`connectedChannels`	Google Search Console must be connected for crawl error and index coverage data; if not connected, audit proceeds on SEMrush data only
`plan`	Site Auditor requires Pro+ plan; Free plan tenants see a locked state with upgrade prompt
`targetAudience`	Informs mobile vs. desktop prioritisation — B2C audiences are predominantly mobile; B2B audiences may be predominantly desktop

Cost Profile


Avg input tokens	~12,000 (system prompt + client context + RAG results + SEMrush crawl data + GSC data + 3 spot-checks)
Avg output tokens	~3,500 (full audit JSON with all findings and action list)
Est. cost / task	~$0.80

Error Handling

Error	Response
SEMrush site audit returns no data (site not yet crawled)	Initiate a new crawl; if crawl takes > 8 min, fail with “SEMrush crawl in progress — retry in 30 minutes”
Google Search Console not connected	Proceed without GSC data; note “GSC not connected — index coverage data unavailable” in summary; reduce technical score confidence
`web_fetch` returns non-200 for spot-check pages	Note the HTTP status in the finding; do not fail the job — proceed with remaining spot-checks
SEMrush returns > 500 individual issues	Aggregate issues by category and surface the top 10 by affected page count; note “Audit returned N total issues — top 10 shown”
Previous audit not found despite `previousAuditDate` being set	Proceed without delta; note “Previous audit not found — delta unavailable” in summary
Score calculation produces a value outside 0–100 due to weighting	Clamp to 0 or 100 and log a warning; flag for engineering review

Site Auditor

Overview

Triggers

Input

Output

Sample output excerpt

How It Works

System Prompt

Skills Injected

technical-seo-standards.md — content

RAG Usage

Tools Required

HITL Gates

Guardrails

Tenant Settings Used

Cost Profile

Error Handling

`technical-seo-standards.md` — content