Content Audit Agent
[To Build] ·
agent__content-auditor· Claude Sonnet 4.6
Audits published blog posts for decay signals: outdated facts, thin sections, keyword cannibalisation, missing internal links, and poor AI search structure. Produces a per-post health score with actionable recommendations and enables a one-click “Refresh this post” workflow.
Related: Blog Writer · Content Auditor Agent · RAG Integration · Content Optimizer · Performance Feedback Loop · Content Toolkit Overview
Overview
| Function | Audit published blog posts for content decay, gaps, and cannibalisation; generate a refresh brief |
| Type | Worker — Content Quality |
| Status | To Build |
| Priority | P2 — Differentiating |
| Queue | agent__content-auditor |
| Concurrency | 2 |
| Timeout | 8 min |
| Est. cost / task | ~$0.60 |
| Credits | 1 cr per audit |
| Plan | Pro+ |
Why This Is Needed
Published blog posts decay. Statistics become outdated, competitors publish fresher content, and internal link opportunities that did not exist at publish time accumulate as the site grows. Without audits, the published content library quietly loses value while consuming no credits to fix.
The content auditor runs on demand (or on a scheduled cycle) and surfaces exactly what needs to change and why — feeding directly into a pre-filled blog refresh brief so no work is wasted re-researching.
What the Audit Checks
1. Content Freshness
| Signal | How detected |
|---|---|
| Date references older than 18 months | Regex for year patterns (e.g. “In 2022”, “As of last year”) |
| Statistics without a recent source | Sentences containing numbers that lack an inline citation link |
| Product/pricing references | Keywords from client’s own product descriptions cross-checked against current client_docs RAG content |
2. Thin Sections
Sections (content between consecutive H2s) with fewer than 200 words are flagged. The agent also checks whether any H2 section is present in the brief outline but contains less than 50% of the expected word allocation.
3. Keyword Cannibalisation
Cross-references all published blog posts in the published_content RAG dataset to detect when two posts compete for the same primary keyword. The audit flags:
- This post’s primary keyword appears as the primary or secondary keyword in another published post
- Suggested resolution: consolidate, differentiate, or prune
4. Missing Internal Links
Queries the published_content RAG dataset to find published posts that are topically related to the audited post but not linked. Flags specific internal link opportunities:
- “Post X covers [related topic] — link from paragraph 3 of this post”
- “Section Y of this post is referenced by Z other posts but does not link back”
5. AI Search Structure Gaps
Same signals as the Content Optimizer’s AI Search Visibility Score, but run in audit mode:
- No direct answer in the intro
- No definition block for the primary keyword
- No FAQ section
- No comparison table
- Statistics without source citations
Input Contract
interface ContentAuditorInput {
tenantId: string;
blogPostId: string;
auditScope: {
freshness: boolean;
thinSections: boolean;
cannibalisation: boolean;
internalLinks: boolean;
aiSearchStructure: boolean;
};
}Output Contract
interface ContentAuditResult {
tenantId: string;
blogPostId: string;
auditedAt: string; // ISO timestamp
healthScore: number; // 0–100 composite
findings: AuditFinding[];
// Pre-filled refresh brief context — fed into blog-writer when "Refresh" is triggered
refreshContext: {
summary: string; // 2-3 sentence summary of what needs to change
priorityFindings: string[]; // Top 3 findings in plain language for the brief
suggestedUpdates: string; // Markdown list of specific suggested updates
};
}
interface AuditFinding {
category: 'freshness' | 'thin_section' | 'cannibalisation' | 'internal_links' | 'ai_search';
severity: 'critical' | 'warning' | 'suggestion';
title: string;
detail: string;
location?: string; // e.g. "Section: Benefits of X" or "Paragraph 4"
action: string; // Specific recommended action
}Health Score Breakdown
| Category | Weight |
|---|---|
| Freshness | 30% |
| Thin sections | 20% |
| Cannibalisation | 20% |
| Internal links | 15% |
| AI search structure | 15% |
Each category scores 0–100 based on the number and severity of findings within it. The composite health score is the weighted average.
Colour coding: red (0–49 = Needs Refresh), amber (50–74 = Review Recommended), green (75–100 = Healthy).
”Refresh This Post” Workflow
DM clicks "Refresh" on an audited blog post
↓
API: POST /tenant/v1/blog/:id/refresh
↓
Creates a new BlogActivity:
- activityType: 'blog_post'
- refreshSourceId: original BlogPost.id
- contentBrief: { ...original brief + audit refreshContext injected }
↓
Activity appears in Activities list as "Blog Refresh: {original title}"
↓
blog-writer agent runs with the original brief + audit findings as context
- Agent instructed to preserve and improve, not rewrite from scratch
- Original post body passed as reference
↓
Normal review workflow: dm_review → client_review → published
↓
On publish: original BlogPost marked as superseded (status: 'superseded')
New post takes the same slug (with optional redirect from old post URL)Audit Dashboard
Dashboard — Content → Audit tab:
- Table of all published blog posts with columns: Title · Published date · Last audit date · Health score · Findings count · Actions
- “Run Audit” button per post (enqueues auditor job)
- “Bulk Audit” button — enqueues audit jobs for all posts not audited in the last 30 days (limited to 10 at a time; credit check first)
- Filters: All / Needs Refresh (red) / Review Recommended (amber) / Healthy (green)
- Sort by health score ascending to surface the worst-performing posts first
Blog Post Detail — Audit Panel:
- Last audit date + health score badge
- Findings list grouped by category
- “Refresh” button (visible when at least one critical or 3+ warning findings)
Scheduled Audits
Tenants can enable automatic monthly audits on all published posts via Settings → Content → Audit Schedule. When enabled:
- On the 1st of each month, a cron job enqueues audit tasks for all
publishedblog posts - Limited to 20 posts per run to cap credit consumption
- Posts audited in the last 15 days are skipped
- Total credit deducted at run start; if insufficient credits, run stops and DM is notified
Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| RAG for cannibalisation + link detection | Query published_content dataset | All published posts are already ingested into RAG at publish time; no separate index needed |
refreshContext in audit output | Agent produces a pre-filled brief context block alongside findings | Eliminates friction between “audit found problems” and “brief is ready to fix them” |
| 1 cr per audit | Same cost as keyword research | Post-publication value; moderate inference usage with multiple RAG queries |
| Supersede on publish | Original post marked superseded when refresh is published | Preserves audit history and ensures the refresh does not overwrite the original until approved |
Implementation Phases
Phase 1 — Agent + Basic Audit
- Create
docs/agents/content-auditor.md(agent doc) - Add
content-auditortoAgentRoletype union - Create
packages/agents/src/workers/content-auditor.worker.ts - Seed system prompt in
packages/db/src/seed.ts - Implement freshness + thin section checks (heuristic, no RAG needed)
- Add
auditResultJSON field +auditedAttoBlogPostmodel (migration) POST /tenant/v1/blog/:id/auditroute- Blog post detail: audit panel with findings list
Phase 2 — RAG-Powered Checks
- Extend worker to query
published_contentRAG dataset for cannibalisation + internal link checks - Audit dashboard tab (list view with health scores, filters)
- “Bulk Audit” action with credit pre-check
Phase 3 — Refresh Workflow
- Add
refreshSourceIdfield toActivitymodel (migration) POST /tenant/v1/blog/:id/refreshroute — creates refresh activity with audit context injected- Extend
blog-writeragent to acceptrefreshContextin input (update system prompt accordingly) - Mark original post as
supersededon refresh post publish
Phase 4 — Scheduled Audits
- Tenant settings: audit schedule toggle
- Monthly cron job in API scheduler
- Credit guard: check balance before bulk audit run