Content Auditor

[To Build] · agent__content-auditor · Claude Sonnet 4.6

Analyses a published blog post for content decay signals — outdated facts, thin sections, keyword cannibalisation, missing internal link opportunities, and AI search structure gaps — and produces a health score with a pre-filled refresh brief context.

Related: Content Audit · Blog Writer · RAG Integration · Content Optimizer

Overview


Function	Audit a published blog post for decay signals and generate a refresh brief context
Type	Worker — Content Quality
Model	Claude Sonnet 4.6
Queue	`agent__content-auditor`
Concurrency	2
Timeout	8 min
Est. cost / task	~$0.60
Credits	1 cr per audit
Plan	Pro+

Triggers

Trigger type	When	Who initiates
Manual	DM clicks “Run Audit” on blog post detail or audit list	Tenant admin / DM reviewer
Bulk manual	”Bulk Audit” from audit dashboard — queues audits for all un-audited posts	Tenant admin / DM reviewer
Scheduled	Monthly cron job (opt-in setting) — queues audit for all published posts not audited in last 30 days	Scheduler (cron)

Input


interface ContentAuditorInput {
  tenantId:    string;
  blogPostId:  string;
 
  auditScope: {
    freshness:          boolean;
    thinSections:       boolean;
    cannibalisation:    boolean;
    internalLinks:      boolean;
    aiSearchStructure:  boolean;
  };
}

Output


interface ContentAuditResult {
  tenantId:     string;
  blogPostId:   string;
  auditedAt:    string;
 
  healthScore:  number;   // 0–100 weighted composite
 
  findings: AuditFinding[];
 
  refreshContext: {
    summary:          string;   // 2–3 sentence summary of what needs to change
    priorityFindings: string[]; // Top 3 findings in plain language
    suggestedUpdates: string;   // Markdown list of specific suggested updates for the brief
  };
}
 
interface AuditFinding {
  category:    'freshness' | 'thin_section' | 'cannibalisation' | 'internal_links' | 'ai_search';
  severity:    'critical' | 'warning' | 'suggestion';
  title:       string;
  detail:      string;
  location?:   string;   // e.g. "Section: Why Email Marketing Works" or "Paragraph 2"
  action:      string;
}

How It Works

Loads the blog post body (bodyMarkdown) from the database
Runs heuristic checks first (freshness regex, thin section word counts, meta checks) — no LLM calls
Queries published_content RAG dataset to detect cannibalisation (other posts with same primary keyword) and internal link opportunities (topically related posts not yet linked)
Sends the full post body to Claude with audit instructions — Claude identifies AI search structure gaps and nuanced freshness issues
Computes per-category scores and the weighted composite health score
Generates refreshContext: a structured summary suitable for injecting directly into a blog-writer brief

System Prompt


You are a content quality auditor for a marketing agency platform. You receive a published blog post and identify issues that reduce its quality and performance.

Audit the following dimensions:

**Freshness:**
- Identify any year references that are more than 18 months old relative to today ({{CURRENT_DATE}})
- Flag statistics cited without a source link
- Flag references to products, pricing, or services that may have changed

**AI Search Structure:**
- Does the introduction directly answer the primary question within the first 150 words?
- Is the primary keyword or core concept explicitly defined?
- Is there a FAQ or Q&A section?
- Is there at least one structured comparison table?
- Are all statistics sourced with an external link or inline attribution?

For each issue found, produce a finding with:
- category (freshness | ai_search)
- severity (critical | warning | suggestion)
- title (max 10 words)
- detail (1–2 sentences, specific to the post)
- location (which section or paragraph)
- action (what to do — specific, not generic)

Then write a refreshContext block:
- summary: 2–3 sentences summarising the biggest issues
- priorityFindings: top 3 findings in plain language (as a writer would read them)
- suggestedUpdates: a Markdown list of specific updates to make in the refresh

{{CLIENT_CONTEXT}}

RAG Usage

Dataset	Query	Used for
`published_content`	`{primaryKeyword}`	Detect other posts targeting the same keyword (cannibalisation)
`published_content`	`{post title} related topics`	Find topically related posts that could be internally linked
`website_content`	`{primaryKeyword} service area`	Find existing service/product pages to link from the post

HITL Gates

The content auditor has no approval gates — it is a read-only analysis. Findings are surfaced directly to the DM for review. The “Refresh this post” action (which creates a new blog activity) does go through the full blog post HITL workflow.

Guardrails

Rule	Enforcement
Audit cannot modify the post	Worker is read-only; writes only to `BlogPost.auditResult` and `BlogPost.auditedAt`
Health score is always 0–100	Output validator clamps to range; returns error if model returns out-of-range value
`refreshContext` is always populated	If Claude returns an empty `refreshContext`, worker retries once with an explicit instruction to complete it
Audit does not fabricate findings	System prompt instructs: “Only report findings based on evidence in the post. Do not infer or assume issues not visible in the text.”

Health Score Weights

Category	Weight
Freshness	30%
Thin sections	20%
Cannibalisation	20%
Internal links	15%
AI search structure	15%

Category scores are computed from findings: critical = -20pts, warning = -10pts, suggestion = -5pts, capped at 0.