Site Auditor
[Live] ·
agent__site-auditor· Claude Sonnet 4.6
Runs a technical SEO audit of the client’s website, cross-references crawl data with Search Console coverage, and produces a prioritised findings report with an actionable remediation list.
Overview
| Function | Audit a client domain for technical SEO issues and produce a prioritised remediation report |
| Type | Worker — SEO |
| Model | Claude Sonnet 4.6 |
| Queue | agent__site-auditor |
| Concurrency | 2 |
| Timeout | 10 min |
| Est. cost / task | ~$0.80 |
| Plan | Pro+ |
Triggers
| Trigger type | When | Who initiates |
|---|---|---|
| Activity Planner dispatch | Monthly — Strategist enqueues a site audit job at the start of the monthly SEO pipeline, before any content work begins | Activity Planner |
| Human on-demand | User clicks “Run audit” in Dashboard or DM Portal — e.g. after a site migration, major content publish, or when a client reports a traffic drop | Tenant admin / DM reviewer |
| Scheduled / cron | Monthly cron on the 1st of the month — runs independently of the Activity Planner pipeline to ensure audit data is always fresh for the month’s reporting | Platform scheduler |
Input
interface SiteAuditorInput {
tenantId: string;
clientDomain: string; // e.g. "acme.com" — no protocol, no trailing slash
previousAuditDate?: string; // ISO date — used to surface what has changed since last audit
focusAreas?: FocusArea[]; // optional — narrows scope; if omitted, all areas are audited
campaignId?: string;
}
type FocusArea =
| 'crawlability'
| 'on-page'
| 'speed'
| 'structured-data'
| 'mobile';Output
interface SiteAuditorOutput {
tenantId: string;
clientDomain: string;
auditDate: string; // ISO timestamp
previousAuditDate?: string;
technicalScore: number; // 0–100 composite score
scoreDelta?: number; // change vs. previous audit (positive = improved)
summary: string; // 3–5 sentence executive summary
criticalIssues: AuditFinding[];
warnings: AuditFinding[];
opportunities: AuditFinding[];
prioritisedActionList: ActionItem[];
focusAreaScores: FocusAreaScore[];
}
interface AuditFinding {
id: string;
category: FocusArea | 'security' | 'links' | 'content';
title: string;
description: string;
affectedUrls: string[]; // up to 5 example URLs
affectedCount: number; // total pages/items affected
impact: 'critical' | 'high' | 'medium' | 'low';
effort: 'low' | 'medium' | 'high';
recommendation: string; // specific fix instruction
changedSinceLast?: boolean; // true if this issue is new or worsened since previous audit
}
interface ActionItem {
priority: number; // 1 = highest
title: string;
rationale: string; // why this is ranked at this priority
estimatedImpact: string; // e.g. "Fix 42 pages with missing H1 — likely +3–5% organic CTR"
effort: 'low' | 'medium' | 'high';
owner: 'dev' | 'content' | 'dm-agency';
}
interface FocusAreaScore {
area: FocusArea | 'security' | 'links' | 'content';
score: number; // 0–100
delta?: number; // change vs previous audit
}Sample output excerpt
## Technical SEO Audit — acme.com
**Audit Date:** 2026-03-01 | **Previous Audit:** 2026-02-01
**Technical Score:** 68/100 (▲ +4 vs. last month)
---
### Executive Summary
acme.com has improved its crawlability score following the February canonical tag fixes (+8 points).
The primary outstanding issues are 34 pages returning 3xx redirects in chains of 2+ hops, which
waste crawl budget and dilute link equity. Core Web Vitals remain a concern — LCP averages 4.2s on
mobile against a Good threshold of 2.5s, driven by uncompressed hero images on product pages.
Structured data coverage is strong (87% of blog posts have Article schema), but 12 product pages
are missing Product schema entirely. This month's priority is redirect chain resolution — a dev
task estimated at 2–3 hours.
---
### Critical Issues (2)
**1. Redirect chains (2+ hops) — 34 URLs**
- Category: Crawlability | Impact: Critical | Effort: Low (dev)
- Affected examples: /old-pricing → /pricing-2024 → /pricing (3 hops)
- Recommendation: Update all internal links pointing to intermediate URLs to point directly
to the final destination. Update the sitemap to use final URLs only.
- ⚠️ New since last audit — 34 URLs have been added to chains since 2026-02-01
**2. LCP > 4s on mobile — 18 product pages**
- Category: Speed | Impact: Critical | Effort: Medium (dev)
- Average LCP: 4.2s | Good threshold: 2.5s
- Recommendation: Compress and serve hero images in WebP format; implement lazy loading for
below-the-fold images; consider preloading the LCP element.
---
### Prioritised Action List
1. [Dev] Fix redirect chains — 34 URLs, 2–3 hours effort, high crawlability impact
2. [Dev] Compress hero images to WebP — 18 product pages, 4–6 hours, Core Web Vitals impact
3. [Content] Add Product schema to 12 product pages — 2 hours, structured data + rich result eligibility
4. [DM Agency] Submit updated sitemap to GSC after redirect fixes — 15 minsHow It Works
-
Load client context. The Client Context File is injected. Tenant settings provide
clientDomain,industry, andplan. IfpreviousAuditDateis provided, the previous audit report is retrieved from the DB for delta calculations. -
RAG: cross-reference known site structure. Query Website Content for the client’s crawled pages, site structure, and any known page issues. Query Client Documents for developer roadmap notes, known technical issues the client has flagged, or planned migrations — this provides context so the audit doesn’t recommend work that’s already in progress.
-
SEMrush site crawl. Call
semrush_site_auditto initiate or retrieve the latest crawl forclientDomain. Extract: crawl errors, broken links, redirect chains, duplicate content, missing meta tags, thin content pages, missing canonical tags, and Core Web Vitals estimates. This is the primary data source for the audit. -
Google Search Console: crawl and index data. Call
google_search_console.getCrawlErrorsfor server errors and not-found URLs. Callgoogle_search_console.getIndexCoveragefor indexed vs. submitted vs. excluded pages. Cross-reference with SEMrush crawl — pages SEMrush finds but GSC excludes are flagged for investigation. -
Spot-check key pages. Call
web_fetchon the homepage, a sample product/service page, and the highest-traffic blog post (from GSC data if available). Check: title tag, meta description, H1, canonical tag, structured data presence (JSON-LD), robots meta, and page load signals. This catches issues the automated crawl may miss. -
Classify and prioritise findings. Categorise every finding by area and impact. Apply the prioritisation matrix: Critical issues with Low effort are ranked first. Within the same impact level, issues affecting more pages rank higher. Issues that are new or worsened since the previous audit are flagged with
changedSinceLast: true. -
Calculate scores. Technical score is a weighted composite: Crawlability 30%, On-page 25%, Speed 20%, Structured Data 15%, Mobile 10%. Score delta is calculated against the previous audit if available. Focus area scores are calculated individually.
-
Write the executive summary. 3–5 sentences covering: overall score and trend, the two most important findings, and one key win (if the score improved). Written for a non-technical client to read — avoids jargon.
System Prompt
You are a technical SEO auditor working for a digital marketing agency. Your job is to
analyse crawl data, Search Console coverage, and spot-checks to produce a prioritised
SEO audit report for a client website.
CLIENT CONTEXT:
{{CLIENT_CONTEXT}}
TENANT SETTINGS:
{{TENANT_SETTINGS}}
KNOWLEDGE BASE CONTEXT:
{{RAG_CONTEXT}}
You have been provided with:
- SEMrush site audit results (crawl errors, redirect chains, on-page issues, Core Web Vitals)
- Google Search Console crawl errors and index coverage data
- Spot-check results for key pages (homepage, a product/service page, top blog post)
- The client's known technical context from their document library
- Previous audit data (if available) for delta comparison
Your output must be a complete technical SEO audit report containing:
1. Technical score (0–100) and delta vs. previous audit
2. Executive summary (3–5 sentences, written for a non-technical client)
3. Critical issues — findings that are actively harming rankings or crawlability
4. Warnings — issues that will cause problems if left unaddressed
5. Opportunities — improvements that would increase rankings or click-through rate
6. Prioritised action list — ranked by impact × effort, with owner (dev/content/dm-agency)
7. Focus area scores (crawlability, on-page, speed, structured-data, mobile)
Prioritisation rules:
- Critical impact + Low effort = Priority 1 always
- Critical impact + High effort = Priority 2–3 (important but acknowledge the effort)
- Medium impact + Low effort = Priority 3–4 ("quick wins")
- Low impact regardless of effort = Bottom of the list
- Issues new or worsened since the previous audit = bump one priority level
Do not fabricate findings. Every issue must reference data from the tool results provided.
For each finding, name the specific fix — "improve page speed" is not a recommendation;
"compress hero images to WebP and implement lazy loading" is.
If the client's document library mentions a known issue or planned fix, note it in the
relevant finding's recommendation as "Note: Client aware — fix in progress per [doc reference]."
Output valid JSON matching the SiteAuditorOutput schema.Skills Injected
| Skill file | Purpose |
|---|---|
client-context-file.md | Always injected — site URL, industry, known site architecture |
technical-seo-standards.md | Reference for what constitutes critical vs. warning vs. opportunity; scoring methodology; fix recommendations for common issues |
technical-seo-standards.md — content
# Technical SEO Standards
## Severity Classification
**Critical** — Issues actively harming crawlability, indexing, or rankings:
- Pages returning 5xx errors
- Redirect chains of 2+ hops affecting > 5 URLs
- Canonical tags pointing to non-canonical pages
- Sitemap containing noindex or 404 URLs
- Duplicate H1 tags site-wide
- Core Web Vitals in "Poor" range on mobile (LCP > 4s, CLS > 0.25, INP > 500ms)
- Pages with hreflang conflicts (multilingual sites)
**Warning** — Issues that will compound over time if unaddressed:
- Missing meta descriptions on > 10% of indexed pages
- Thin content (< 300 words) on pages meant to rank
- Broken internal links (404s in internal anchor tags)
- Missing alt text on > 20% of images
- Pages excluded from index without clear rationale (not noindex, not canonical — orphaned)
- Structured data errors (valid schema but with missing recommended fields)
- Redirect chains of exactly 1 hop (inefficient but not critical)
**Opportunity** — Improvements that would improve ranking or CTR:
- Pages missing Article, Product, FAQ, or HowTo schema where applicable
- Title tags not using the primary keyword in the first 60 characters
- Meta descriptions missing on ≤ 10% of pages
- Internal link opportunities (pages in the same cluster not linked to each other)
- Featured snippet gaps (queries the site ranks 1–5 for but doesn't hold the snippet)
## Technical Score Weighting
| Area | Weight |
|---|---|
| Crawlability | 30% |
| On-page (titles, metas, H1s, canonicals) | 25% |
| Speed (Core Web Vitals) | 20% |
| Structured Data | 15% |
| Mobile | 10% |
Score calculation per area: start at 100, subtract points per finding:
- Critical finding: −15 points each (capped at −45)
- Warning: −5 points each (capped at −20)
- Opportunity: −2 points each (capped at −10)
## Owner Classification
- **Dev:** Server-side fixes, image optimisation, redirect resolution, schema implementation
- **Content:** Meta descriptions, title tags, H1 copy, thin content expansion, alt text
- **DM Agency:** Sitemap resubmission, GSC disavow, internal linking recommendations
## Core Web Vitals Thresholds
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| LCP | < 2.5s | 2.5s–4.0s | > 4.0s |
| INP | < 200ms | 200ms–500ms | > 500ms |
| CLS | < 0.1 | 0.1–0.25 | > 0.25 |RAG Usage
| Dataset | Query example | When used |
|---|---|---|
| Website Content | "site structure crawled pages URL list architecture" | Step 2 — cross-reference indexed pages vs. audit findings; identify orphaned pages |
| Client Documents | "known technical issues dev roadmap planned migrations" | Step 2 — avoid recommending work already in progress; adds context to findings |
| Published Content | "recently published pages blog posts URLs" | Step 5 — used to select the right page for spot-check |
| Competitor Research | Not typically queried | Audit is focused on the client domain only |
Tools Required
| Tool | Method | Purpose | Required? |
|---|---|---|---|
rag_search | search | Query site structure and client documents | Yes |
semrush_site_audit | GET | Full site crawl — errors, redirects, on-page issues, Core Web Vitals | Yes |
google_search_console | getCrawlErrors | Server errors and 404s reported by Googlebot | Yes |
google_search_console | getIndexCoverage | Indexed vs. submitted vs. excluded page counts | Yes |
web_fetch | GET | Spot-check key pages for title, meta, H1, canonical, structured data | Yes |
HITL Gates
- Review type:
site_audit_review - Risk level:
medium - Trigger: Always — the audit report is presented to the DM reviewer before being shared with the client or used to dispatch remediation tasks.
- Reviewer action: Approve the report, edit findings (mark as “in progress” or “client aware”), reorder the action list, or add manual findings not captured by the automated tools. Approved report is stored as the monthly audit deliverable and shared with the client on the next reporting cycle.
- Escalation: If critical issues affect site indexing or show a significant score drop (> 10 points), the reviewer is notified immediately via email/Slack rather than in the normal weekly review queue.
Guardrails
| Rule | Enforcement |
|---|---|
| Every finding must reference tool data | Agent is instructed not to fabricate findings; post-generation validator checks that each finding has an affectedCount > 0 |
| Technical score must be 0–100 | Range check; scores outside range trigger a retry |
| Prioritised action list must have ≥ 3 items | Count check; if fewer, retry with explicit instruction |
| Critical issues must not exceed 10 | If SEMrush returns > 10 critical findings, they are grouped by category and the top 10 by affected page count are surfaced individually; remainder are summarised |
| Spot-check URLs must be real fetched pages | web_fetch response status must be 200; non-200 responses are noted in the finding |
Tenant Settings Used
| Setting | How it’s used |
|---|---|
industry | Informs which structured data schemas are expected (e-commerce → Product; publisher → Article; local business → LocalBusiness) |
connectedChannels | Google Search Console must be connected for crawl error and index coverage data; if not connected, audit proceeds on SEMrush data only |
plan | Site Auditor requires Pro+ plan; Free plan tenants see a locked state with upgrade prompt |
targetAudience | Informs mobile vs. desktop prioritisation — B2C audiences are predominantly mobile; B2B audiences may be predominantly desktop |
Cost Profile
| Avg input tokens | ~12,000 (system prompt + client context + RAG results + SEMrush crawl data + GSC data + 3 spot-checks) |
| Avg output tokens | ~3,500 (full audit JSON with all findings and action list) |
| Est. cost / task | ~$0.80 |
Error Handling
| Error | Response |
|---|---|
| SEMrush site audit returns no data (site not yet crawled) | Initiate a new crawl; if crawl takes > 8 min, fail with “SEMrush crawl in progress — retry in 30 minutes” |
| Google Search Console not connected | Proceed without GSC data; note “GSC not connected — index coverage data unavailable” in summary; reduce technical score confidence |
web_fetch returns non-200 for spot-check pages | Note the HTTP status in the finding; do not fail the job — proceed with remaining spot-checks |
| SEMrush returns > 500 individual issues | Aggregate issues by category and surface the top 10 by affected page count; note “Audit returned N total issues — top 10 shown” |
Previous audit not found despite previousAuditDate being set | Proceed without delta; note “Previous audit not found — delta unavailable” in summary |
| Score calculation produces a value outside 0–100 due to weighting | Clamp to 0 or 100 and log a warning; flag for engineering review |