Duplicate Website Insights: Two Runs per Setup Chain
Status: ✅ Fixed (2026-05-05) — guard added in tenant-website-crawler.worker.ts
Severity: Low — data correct (both runs produce identical score), but wastes API credits
Worker: packages/agents/src/workers/website-insights.worker.ts
Symptom
A fresh tenant triggers two separate website insights jobs with different IDs, both completing with overallScore: 98, pageCount: 100:
[09:08:20] Website insights job started websiteInsightId: cmos2w74k00jbw164bmo2zuly
[09:09:26] Website insight completed websiteInsightId: cmos2w74k00jbw164bmo2zuly score: 98
[09:25:37] Website insights job started websiteInsightId: cmos3if0k005vw1ukrhigj927
[09:26:55] Website insight completed websiteInsightId: cmos3if0k005vw1ukrhigj927 score: 98The two runs are 17 minutes apart. The first fires immediately after the crawl completes (09:08). The second fires at 09:25, concurrent with context-file-writer completing and the AI visibility seeder starting.
Root Cause (Confirmed)
There is only ONE enqueue call site: tenant-website-crawler.worker.ts (line ~635). The duplicate came from a SECOND crawl job running.
Trigger chain:
- Registration →
enqueueWebsiteCrawl→ crawl #1 → website insights #1 (09:08) - User opens onboarding wizard Brand Assets step → clicks “Refresh” because brand assets haven’t populated yet →
triggerWebsiteCrawl()called →POST /tenant/v1/channels/:id/webcrawl/start→startWebCrawl()→ crawl #2 → website insights #2 (09:25)
triggerWebsiteCrawl() is the “Refresh” button in apps/dashboard/src/app/(dashboard)/onboarding/OnboardingWizard.tsx — it fires when getBrandAssets() returns no data (brand extraction hadn’t run yet when user reached that step).
Fix Applied
Added a guard in tenant-website-crawler.worker.ts before creating and enqueuing the website insight record:
// Skip if one is already pending/generating for this channel
const existingInsight = await prisma.websiteInsight.findFirst({
where: { connectedChannelId, status: { in: ["pending", "generating"] } },
select: { id: true },
});
if (existingInsight) {
log.info({ ... }, "Skipping website insight enqueue — one is already pending/generating");
} else {
// create + enqueue as before
}Why this works: The second crawl starts while the first insight is still pending. The guard detects the pending insight and skips the duplicate. If no insight is in flight (e.g. user does a fresh re-crawl a day later), the guard passes and a new insight is created normally.