Content Audit Agent

[To Build] · agent__content-auditor · Claude Sonnet 4.6

Audits published blog posts for decay signals: outdated facts, thin sections, keyword cannibalisation, missing internal links, and poor AI search structure. Produces a per-post health score with actionable recommendations and enables a one-click “Refresh this post” workflow.

Related: Blog Writer · Content Auditor Agent · RAG Integration · Content Optimizer · Performance Feedback Loop · Content Toolkit Overview

Overview


Function	Audit published blog posts for content decay, gaps, and cannibalisation; generate a refresh brief
Type	Worker — Content Quality
Status	To Build
Priority	P2 — Differentiating
Queue	`agent__content-auditor`
Concurrency	2
Timeout	8 min
Est. cost / task	~$0.60
Credits	1 cr per audit
Plan	Pro+

Why This Is Needed

Published blog posts decay. Statistics become outdated, competitors publish fresher content, and internal link opportunities that did not exist at publish time accumulate as the site grows. Without audits, the published content library quietly loses value while consuming no credits to fix.

The content auditor runs on demand (or on a scheduled cycle) and surfaces exactly what needs to change and why — feeding directly into a pre-filled blog refresh brief so no work is wasted re-researching.

What the Audit Checks

1. Content Freshness

Signal	How detected
Date references older than 18 months	Regex for year patterns (e.g. “In 2022”, “As of last year”)
Statistics without a recent source	Sentences containing numbers that lack an inline citation link
Product/pricing references	Keywords from client’s own product descriptions cross-checked against current `client_docs` RAG content

2. Thin Sections

Sections (content between consecutive H2s) with fewer than 200 words are flagged. The agent also checks whether any H2 section is present in the brief outline but contains less than 50% of the expected word allocation.

3. Keyword Cannibalisation

Cross-references all published blog posts in the published_content RAG dataset to detect when two posts compete for the same primary keyword. The audit flags:

This post’s primary keyword appears as the primary or secondary keyword in another published post
Suggested resolution: consolidate, differentiate, or prune

4. Missing Internal Links

Queries the published_content RAG dataset to find published posts that are topically related to the audited post but not linked. Flags specific internal link opportunities:

“Post X covers [related topic] — link from paragraph 3 of this post”
“Section Y of this post is referenced by Z other posts but does not link back”

5. AI Search Structure Gaps

Same signals as the Content Optimizer’s AI Search Visibility Score, but run in audit mode:

No direct answer in the intro
No definition block for the primary keyword
No FAQ section
No comparison table
Statistics without source citations

Input Contract


interface ContentAuditorInput {
  tenantId:    string;
  blogPostId:  string;
 
  auditScope: {
    freshness:          boolean;
    thinSections:       boolean;
    cannibalisation:    boolean;
    internalLinks:      boolean;
    aiSearchStructure:  boolean;
  };
}

Output Contract


interface ContentAuditResult {
  tenantId:     string;
  blogPostId:   string;
  auditedAt:    string;   // ISO timestamp
 
  healthScore:  number;   // 0–100 composite
 
  findings: AuditFinding[];
 
  // Pre-filled refresh brief context — fed into blog-writer when "Refresh" is triggered
  refreshContext: {
    summary:            string;   // 2-3 sentence summary of what needs to change
    priorityFindings:   string[]; // Top 3 findings in plain language for the brief
    suggestedUpdates:   string;   // Markdown list of specific suggested updates
  };
}
 
interface AuditFinding {
  category:    'freshness' | 'thin_section' | 'cannibalisation' | 'internal_links' | 'ai_search';
  severity:    'critical' | 'warning' | 'suggestion';
  title:       string;
  detail:      string;
  location?:   string;   // e.g. "Section: Benefits of X" or "Paragraph 4"
  action:      string;   // Specific recommended action
}

Health Score Breakdown

Category	Weight
Freshness	30%
Thin sections	20%
Cannibalisation	20%
Internal links	15%
AI search structure	15%

Each category scores 0–100 based on the number and severity of findings within it. The composite health score is the weighted average.

Colour coding: red (0–49 = Needs Refresh), amber (50–74 = Review Recommended), green (75–100 = Healthy).

”Refresh This Post” Workflow


DM clicks "Refresh" on an audited blog post
  ↓
API: POST /tenant/v1/blog/:id/refresh
  ↓
Creates a new BlogActivity:
  - activityType: 'blog_post'
  - refreshSourceId: original BlogPost.id
  - contentBrief: { ...original brief + audit refreshContext injected }
  ↓
Activity appears in Activities list as "Blog Refresh: {original title}"
  ↓
blog-writer agent runs with the original brief + audit findings as context
  - Agent instructed to preserve and improve, not rewrite from scratch
  - Original post body passed as reference
  ↓
Normal review workflow: dm_review → client_review → published
  ↓
On publish: original BlogPost marked as superseded (status: 'superseded')
New post takes the same slug (with optional redirect from old post URL)

Audit Dashboard

Dashboard — Content → Audit tab:

Table of all published blog posts with columns: Title · Published date · Last audit date · Health score · Findings count · Actions
“Run Audit” button per post (enqueues auditor job)
“Bulk Audit” button — enqueues audit jobs for all posts not audited in the last 30 days (limited to 10 at a time; credit check first)
Filters: All / Needs Refresh (red) / Review Recommended (amber) / Healthy (green)
Sort by health score ascending to surface the worst-performing posts first

Blog Post Detail — Audit Panel:

Last audit date + health score badge
Findings list grouped by category
“Refresh” button (visible when at least one critical or 3+ warning findings)

Scheduled Audits

Tenants can enable automatic monthly audits on all published posts via Settings → Content → Audit Schedule. When enabled:

On the 1st of each month, a cron job enqueues audit tasks for all published blog posts
Limited to 20 posts per run to cap credit consumption
Posts audited in the last 15 days are skipped
Total credit deducted at run start; if insufficient credits, run stops and DM is notified

Key Design Decisions

Decision	Choice	Rationale
RAG for cannibalisation + link detection	Query `published_content` dataset	All published posts are already ingested into RAG at publish time; no separate index needed
`refreshContext` in audit output	Agent produces a pre-filled brief context block alongside findings	Eliminates friction between “audit found problems” and “brief is ready to fix them”
1 cr per audit	Same cost as keyword research	Post-publication value; moderate inference usage with multiple RAG queries
Supersede on publish	Original post marked `superseded` when refresh is published	Preserves audit history and ensures the refresh does not overwrite the original until approved

Implementation Phases

Phase 1 — Agent + Basic Audit

Create docs/agents/content-auditor.md (agent doc)
Add content-auditor to AgentRole type union
Create packages/agents/src/workers/content-auditor.worker.ts
Seed system prompt in packages/db/src/seed.ts
Implement freshness + thin section checks (heuristic, no RAG needed)
Add auditResult JSON field + auditedAt to BlogPost model (migration)
POST /tenant/v1/blog/:id/audit route
Blog post detail: audit panel with findings list

Phase 2 — RAG-Powered Checks

Extend worker to query published_content RAG dataset for cannibalisation + internal link checks
Audit dashboard tab (list view with health scores, filters)
“Bulk Audit” action with credit pre-check

Phase 3 — Refresh Workflow

Add refreshSourceId field to Activity model (migration)
POST /tenant/v1/blog/:id/refresh route — creates refresh activity with audit context injected
Extend blog-writer agent to accept refreshContext in input (update system prompt accordingly)
Mark original post as superseded on refresh post publish

Phase 4 — Scheduled Audits

Tenant settings: audit schedule toggle
Monthly cron job in API scheduler
Credit guard: check balance before bulk audit run