Skip to Content
IssuesDuplicate Website Insights: Two Runs per Setup Chain

Duplicate Website Insights: Two Runs per Setup Chain

Status: ✅ Fixed (2026-05-05) — guard added in tenant-website-crawler.worker.ts
Severity: Low — data correct (both runs produce identical score), but wastes API credits
Worker: packages/agents/src/workers/website-insights.worker.ts

Symptom

A fresh tenant triggers two separate website insights jobs with different IDs, both completing with overallScore: 98, pageCount: 100:

[09:08:20] Website insights job started websiteInsightId: cmos2w74k00jbw164bmo2zuly [09:09:26] Website insight completed websiteInsightId: cmos2w74k00jbw164bmo2zuly score: 98 [09:25:37] Website insights job started websiteInsightId: cmos3if0k005vw1ukrhigj927 [09:26:55] Website insight completed websiteInsightId: cmos3if0k005vw1ukrhigj927 score: 98

The two runs are 17 minutes apart. The first fires immediately after the crawl completes (09:08). The second fires at 09:25, concurrent with context-file-writer completing and the AI visibility seeder starting.

Root Cause (Confirmed)

There is only ONE enqueue call site: tenant-website-crawler.worker.ts (line ~635). The duplicate came from a SECOND crawl job running.

Trigger chain:

  1. Registration → enqueueWebsiteCrawl → crawl #1 → website insights #1 (09:08)
  2. User opens onboarding wizard Brand Assets step → clicks “Refresh” because brand assets haven’t populated yet → triggerWebsiteCrawl() called → POST /tenant/v1/channels/:id/webcrawl/startstartWebCrawl() → crawl #2 → website insights #2 (09:25)

triggerWebsiteCrawl() is the “Refresh” button in apps/dashboard/src/app/(dashboard)/onboarding/OnboardingWizard.tsx — it fires when getBrandAssets() returns no data (brand extraction hadn’t run yet when user reached that step).

Fix Applied

Added a guard in tenant-website-crawler.worker.ts before creating and enqueuing the website insight record:

// Skip if one is already pending/generating for this channel const existingInsight = await prisma.websiteInsight.findFirst({ where: { connectedChannelId, status: { in: ["pending", "generating"] } }, select: { id: true }, }); if (existingInsight) { log.info({ ... }, "Skipping website insight enqueue — one is already pending/generating"); } else { // create + enqueue as before }

Why this works: The second crawl starts while the first insight is still pending. The guard detects the pending insight and skips the duplicate. If no insight is in flight (e.g. user does a fresh re-crawl a day later), the guard passes and a new insight is created normally.

© 2026 Leadmetrics — Internal use only