Skip to Content
AgentsSite Auditor

Site Auditor

[Live] · agent__site-auditor · Claude Sonnet 4.6

Runs a technical SEO audit of the client’s website, cross-references crawl data with Search Console coverage, and produces a prioritised findings report with an actionable remediation list.


Overview

FunctionAudit a client domain for technical SEO issues and produce a prioritised remediation report
TypeWorker — SEO
ModelClaude Sonnet 4.6
Queueagent__site-auditor
Concurrency2
Timeout10 min
Est. cost / task~$0.80
PlanPro+

Triggers

Trigger typeWhenWho initiates
Activity Planner dispatchMonthly — Strategist enqueues a site audit job at the start of the monthly SEO pipeline, before any content work beginsActivity Planner
Human on-demandUser clicks “Run audit” in Dashboard or DM Portal — e.g. after a site migration, major content publish, or when a client reports a traffic dropTenant admin / DM reviewer
Scheduled / cronMonthly cron on the 1st of the month — runs independently of the Activity Planner pipeline to ensure audit data is always fresh for the month’s reportingPlatform scheduler

Input

interface SiteAuditorInput { tenantId: string; clientDomain: string; // e.g. "acme.com" — no protocol, no trailing slash previousAuditDate?: string; // ISO date — used to surface what has changed since last audit focusAreas?: FocusArea[]; // optional — narrows scope; if omitted, all areas are audited campaignId?: string; } type FocusArea = | 'crawlability' | 'on-page' | 'speed' | 'structured-data' | 'mobile';

Output

interface SiteAuditorOutput { tenantId: string; clientDomain: string; auditDate: string; // ISO timestamp previousAuditDate?: string; technicalScore: number; // 0–100 composite score scoreDelta?: number; // change vs. previous audit (positive = improved) summary: string; // 3–5 sentence executive summary criticalIssues: AuditFinding[]; warnings: AuditFinding[]; opportunities: AuditFinding[]; prioritisedActionList: ActionItem[]; focusAreaScores: FocusAreaScore[]; } interface AuditFinding { id: string; category: FocusArea | 'security' | 'links' | 'content'; title: string; description: string; affectedUrls: string[]; // up to 5 example URLs affectedCount: number; // total pages/items affected impact: 'critical' | 'high' | 'medium' | 'low'; effort: 'low' | 'medium' | 'high'; recommendation: string; // specific fix instruction changedSinceLast?: boolean; // true if this issue is new or worsened since previous audit } interface ActionItem { priority: number; // 1 = highest title: string; rationale: string; // why this is ranked at this priority estimatedImpact: string; // e.g. "Fix 42 pages with missing H1 — likely +3–5% organic CTR" effort: 'low' | 'medium' | 'high'; owner: 'dev' | 'content' | 'dm-agency'; } interface FocusAreaScore { area: FocusArea | 'security' | 'links' | 'content'; score: number; // 0–100 delta?: number; // change vs previous audit }

Sample output excerpt

## Technical SEO Audit — acme.com **Audit Date:** 2026-03-01 | **Previous Audit:** 2026-02-01 **Technical Score:** 68/100 (▲ +4 vs. last month) --- ### Executive Summary acme.com has improved its crawlability score following the February canonical tag fixes (+8 points). The primary outstanding issues are 34 pages returning 3xx redirects in chains of 2+ hops, which waste crawl budget and dilute link equity. Core Web Vitals remain a concern — LCP averages 4.2s on mobile against a Good threshold of 2.5s, driven by uncompressed hero images on product pages. Structured data coverage is strong (87% of blog posts have Article schema), but 12 product pages are missing Product schema entirely. This month's priority is redirect chain resolution — a dev task estimated at 2–3 hours. --- ### Critical Issues (2) **1. Redirect chains (2+ hops) — 34 URLs** - Category: Crawlability | Impact: Critical | Effort: Low (dev) - Affected examples: /old-pricing → /pricing-2024 → /pricing (3 hops) - Recommendation: Update all internal links pointing to intermediate URLs to point directly to the final destination. Update the sitemap to use final URLs only. - ⚠️ New since last audit — 34 URLs have been added to chains since 2026-02-01 **2. LCP > 4s on mobile — 18 product pages** - Category: Speed | Impact: Critical | Effort: Medium (dev) - Average LCP: 4.2s | Good threshold: 2.5s - Recommendation: Compress and serve hero images in WebP format; implement lazy loading for below-the-fold images; consider preloading the LCP element. --- ### Prioritised Action List 1. [Dev] Fix redirect chains — 34 URLs, 2–3 hours effort, high crawlability impact 2. [Dev] Compress hero images to WebP — 18 product pages, 4–6 hours, Core Web Vitals impact 3. [Content] Add Product schema to 12 product pages — 2 hours, structured data + rich result eligibility 4. [DM Agency] Submit updated sitemap to GSC after redirect fixes — 15 mins

How It Works

  1. Load client context. The Client Context File is injected. Tenant settings provide clientDomain, industry, and plan. If previousAuditDate is provided, the previous audit report is retrieved from the DB for delta calculations.

  2. RAG: cross-reference known site structure. Query Website Content for the client’s crawled pages, site structure, and any known page issues. Query Client Documents for developer roadmap notes, known technical issues the client has flagged, or planned migrations — this provides context so the audit doesn’t recommend work that’s already in progress.

  3. SEMrush site crawl. Call semrush_site_audit to initiate or retrieve the latest crawl for clientDomain. Extract: crawl errors, broken links, redirect chains, duplicate content, missing meta tags, thin content pages, missing canonical tags, and Core Web Vitals estimates. This is the primary data source for the audit.

  4. Google Search Console: crawl and index data. Call google_search_console.getCrawlErrors for server errors and not-found URLs. Call google_search_console.getIndexCoverage for indexed vs. submitted vs. excluded pages. Cross-reference with SEMrush crawl — pages SEMrush finds but GSC excludes are flagged for investigation.

  5. Spot-check key pages. Call web_fetch on the homepage, a sample product/service page, and the highest-traffic blog post (from GSC data if available). Check: title tag, meta description, H1, canonical tag, structured data presence (JSON-LD), robots meta, and page load signals. This catches issues the automated crawl may miss.

  6. Classify and prioritise findings. Categorise every finding by area and impact. Apply the prioritisation matrix: Critical issues with Low effort are ranked first. Within the same impact level, issues affecting more pages rank higher. Issues that are new or worsened since the previous audit are flagged with changedSinceLast: true.

  7. Calculate scores. Technical score is a weighted composite: Crawlability 30%, On-page 25%, Speed 20%, Structured Data 15%, Mobile 10%. Score delta is calculated against the previous audit if available. Focus area scores are calculated individually.

  8. Write the executive summary. 3–5 sentences covering: overall score and trend, the two most important findings, and one key win (if the score improved). Written for a non-technical client to read — avoids jargon.


System Prompt

You are a technical SEO auditor working for a digital marketing agency. Your job is to analyse crawl data, Search Console coverage, and spot-checks to produce a prioritised SEO audit report for a client website. CLIENT CONTEXT: {{CLIENT_CONTEXT}} TENANT SETTINGS: {{TENANT_SETTINGS}} KNOWLEDGE BASE CONTEXT: {{RAG_CONTEXT}} You have been provided with: - SEMrush site audit results (crawl errors, redirect chains, on-page issues, Core Web Vitals) - Google Search Console crawl errors and index coverage data - Spot-check results for key pages (homepage, a product/service page, top blog post) - The client's known technical context from their document library - Previous audit data (if available) for delta comparison Your output must be a complete technical SEO audit report containing: 1. Technical score (0–100) and delta vs. previous audit 2. Executive summary (3–5 sentences, written for a non-technical client) 3. Critical issues — findings that are actively harming rankings or crawlability 4. Warnings — issues that will cause problems if left unaddressed 5. Opportunities — improvements that would increase rankings or click-through rate 6. Prioritised action list — ranked by impact × effort, with owner (dev/content/dm-agency) 7. Focus area scores (crawlability, on-page, speed, structured-data, mobile) Prioritisation rules: - Critical impact + Low effort = Priority 1 always - Critical impact + High effort = Priority 2–3 (important but acknowledge the effort) - Medium impact + Low effort = Priority 3–4 ("quick wins") - Low impact regardless of effort = Bottom of the list - Issues new or worsened since the previous audit = bump one priority level Do not fabricate findings. Every issue must reference data from the tool results provided. For each finding, name the specific fix — "improve page speed" is not a recommendation; "compress hero images to WebP and implement lazy loading" is. If the client's document library mentions a known issue or planned fix, note it in the relevant finding's recommendation as "Note: Client aware — fix in progress per [doc reference]." Output valid JSON matching the SiteAuditorOutput schema.

Skills Injected

Skill filePurpose
client-context-file.mdAlways injected — site URL, industry, known site architecture
technical-seo-standards.mdReference for what constitutes critical vs. warning vs. opportunity; scoring methodology; fix recommendations for common issues

technical-seo-standards.md — content

# Technical SEO Standards ## Severity Classification **Critical** — Issues actively harming crawlability, indexing, or rankings: - Pages returning 5xx errors - Redirect chains of 2+ hops affecting > 5 URLs - Canonical tags pointing to non-canonical pages - Sitemap containing noindex or 404 URLs - Duplicate H1 tags site-wide - Core Web Vitals in "Poor" range on mobile (LCP > 4s, CLS > 0.25, INP > 500ms) - Pages with hreflang conflicts (multilingual sites) **Warning** — Issues that will compound over time if unaddressed: - Missing meta descriptions on > 10% of indexed pages - Thin content (< 300 words) on pages meant to rank - Broken internal links (404s in internal anchor tags) - Missing alt text on > 20% of images - Pages excluded from index without clear rationale (not noindex, not canonical — orphaned) - Structured data errors (valid schema but with missing recommended fields) - Redirect chains of exactly 1 hop (inefficient but not critical) **Opportunity** — Improvements that would improve ranking or CTR: - Pages missing Article, Product, FAQ, or HowTo schema where applicable - Title tags not using the primary keyword in the first 60 characters - Meta descriptions missing on ≤ 10% of pages - Internal link opportunities (pages in the same cluster not linked to each other) - Featured snippet gaps (queries the site ranks 1–5 for but doesn't hold the snippet) ## Technical Score Weighting | Area | Weight | |---|---| | Crawlability | 30% | | On-page (titles, metas, H1s, canonicals) | 25% | | Speed (Core Web Vitals) | 20% | | Structured Data | 15% | | Mobile | 10% | Score calculation per area: start at 100, subtract points per finding: - Critical finding: −15 points each (capped at −45) - Warning: −5 points each (capped at −20) - Opportunity: −2 points each (capped at −10) ## Owner Classification - **Dev:** Server-side fixes, image optimisation, redirect resolution, schema implementation - **Content:** Meta descriptions, title tags, H1 copy, thin content expansion, alt text - **DM Agency:** Sitemap resubmission, GSC disavow, internal linking recommendations ## Core Web Vitals Thresholds | Metric | Good | Needs Improvement | Poor | |---|---|---|---| | LCP | < 2.5s | 2.5s–4.0s | > 4.0s | | INP | < 200ms | 200ms–500ms | > 500ms | | CLS | < 0.1 | 0.1–0.25 | > 0.25 |

RAG Usage

DatasetQuery exampleWhen used
Website Content"site structure crawled pages URL list architecture"Step 2 — cross-reference indexed pages vs. audit findings; identify orphaned pages
Client Documents"known technical issues dev roadmap planned migrations"Step 2 — avoid recommending work already in progress; adds context to findings
Published Content"recently published pages blog posts URLs"Step 5 — used to select the right page for spot-check
Competitor ResearchNot typically queriedAudit is focused on the client domain only

Tools Required

ToolMethodPurposeRequired?
rag_searchsearchQuery site structure and client documentsYes
semrush_site_auditGETFull site crawl — errors, redirects, on-page issues, Core Web VitalsYes
google_search_consolegetCrawlErrorsServer errors and 404s reported by GooglebotYes
google_search_consolegetIndexCoverageIndexed vs. submitted vs. excluded page countsYes
web_fetchGETSpot-check key pages for title, meta, H1, canonical, structured dataYes

HITL Gates

  • Review type: site_audit_review
  • Risk level: medium
  • Trigger: Always — the audit report is presented to the DM reviewer before being shared with the client or used to dispatch remediation tasks.
  • Reviewer action: Approve the report, edit findings (mark as “in progress” or “client aware”), reorder the action list, or add manual findings not captured by the automated tools. Approved report is stored as the monthly audit deliverable and shared with the client on the next reporting cycle.
  • Escalation: If critical issues affect site indexing or show a significant score drop (> 10 points), the reviewer is notified immediately via email/Slack rather than in the normal weekly review queue.

Guardrails

RuleEnforcement
Every finding must reference tool dataAgent is instructed not to fabricate findings; post-generation validator checks that each finding has an affectedCount > 0
Technical score must be 0–100Range check; scores outside range trigger a retry
Prioritised action list must have ≥ 3 itemsCount check; if fewer, retry with explicit instruction
Critical issues must not exceed 10If SEMrush returns > 10 critical findings, they are grouped by category and the top 10 by affected page count are surfaced individually; remainder are summarised
Spot-check URLs must be real fetched pagesweb_fetch response status must be 200; non-200 responses are noted in the finding

Tenant Settings Used

SettingHow it’s used
industryInforms which structured data schemas are expected (e-commerce → Product; publisher → Article; local business → LocalBusiness)
connectedChannelsGoogle Search Console must be connected for crawl error and index coverage data; if not connected, audit proceeds on SEMrush data only
planSite Auditor requires Pro+ plan; Free plan tenants see a locked state with upgrade prompt
targetAudienceInforms mobile vs. desktop prioritisation — B2C audiences are predominantly mobile; B2B audiences may be predominantly desktop

Cost Profile

Avg input tokens~12,000 (system prompt + client context + RAG results + SEMrush crawl data + GSC data + 3 spot-checks)
Avg output tokens~3,500 (full audit JSON with all findings and action list)
Est. cost / task~$0.80

Error Handling

ErrorResponse
SEMrush site audit returns no data (site not yet crawled)Initiate a new crawl; if crawl takes > 8 min, fail with “SEMrush crawl in progress — retry in 30 minutes”
Google Search Console not connectedProceed without GSC data; note “GSC not connected — index coverage data unavailable” in summary; reduce technical score confidence
web_fetch returns non-200 for spot-check pagesNote the HTTP status in the finding; do not fail the job — proceed with remaining spot-checks
SEMrush returns > 500 individual issuesAggregate issues by category and surface the top 10 by affected page count; note “Audit returned N total issues — top 10 shown”
Previous audit not found despite previousAuditDate being setProceed without delta; note “Previous audit not found — delta unavailable” in summary
Score calculation produces a value outside 0–100 due to weightingClamp to 0 or 100 and log a warning; flag for engineering review

© 2026 Leadmetrics — Internal use only