RAG Integration

Overview

The RAG (Retrieval-Augmented Generation) system gives agents on-demand access to client-specific knowledge that is too large, too detailed, or too dynamic to fit in a static skill file. It is complementary to the skills system — skills inject a curated summary when the agent starts; RAG lets the agent retrieve specific facts at the exact moment it needs them during execution.

Technical design: rag-architecture.md — Prisma schema, Qdrant collection design, BullMQ ingestion pipeline, hybrid search algorithm, tool implementation. UI screens: screens-knowledge-base.md — KB1 through KB6. Related: Skills System | Tool Integration Layer | Onboarding

The Two-Layer Context Strategy

Every agent run for a tenant receives context in two layers:

Layer	Mechanism	Content	When used
Skills (static)	`--add-dir` / system prompt prepend	Client context file — company summary, tone, products, competitors, USPs	Always — injected at agent start for every task
RAG (dynamic)	Pre-loaded by worker code OR `search_knowledge.js` tool call	Specific documents, full website pages, past campaign copy, competitor detail	Worker injects before Claude starts (pre-loaded), or Claude calls a Bash tool (skill script)

The skills layer answers “Who is this client?” The RAG layer answers “What exactly does the client’s pricing page say?” or “How did we write the last three blog posts for this client?”

RAG delivery mechanisms

There are two distinct ways agents receive RAG context in this codebase:

Mechanism	How it works	Agents using it
Pre-loaded	Worker calls `search()` before building the Claude prompt. Results injected as `KNOWLEDGE BASE CONTEXT` section. Claude receives context passively — no tool call.	`blog-writer`, `strategy-writer`, all roles in `content.worker.ts` (18 roles)
Skill script	`createSkillsDir()` writes `search_knowledge.js` + `CLAUDE.md` with mandatory instruction to call the script before writing. Claude executes `node search_knowledge.js "query"` as a Bash tool call and reads the JSON results.	`client-researcher`, `competitor-researcher`, `context-file-writer`

Why not put everything in the skill file?

Reason	Detail
Context window limits	A client with 40 uploaded docs, a 200-page website, and 12 months of published content cannot fit in any model’s context window
Token cost	Injecting all content on every task wastes money — RAG retrieves only the relevant chunks
Freshness	Skill files are updated manually; RAG datasets update continuously as content is published, the site is re-crawled, or documents are uploaded

Implementation Approach

The RAG system is implemented natively inside Leadmetrics — not as a separate sidecar service. New packages added to the monorepo:

Package	Purpose
`providers/provider-qdrant`	Qdrant client singleton, collection helpers
`features/feature-knowledge`	Dataset and file management, ingestion queue
`features/feature-search`	Hybrid search (vector + keyword + RRF), reranking

New routes in apps/api and Knowledge Base screens in apps/dashboard. Full details in rag-architecture.md.

Datasets

Each tenant has four standard datasets, created automatically when the tenant is provisioned.

Dataset	Purpose	Privacy
`client_docs`	Uploaded brand/product documents (PDF, DOCX, TXT, MD)	Tenant-scoped; cloud embedding OK
`website_content`	Crawled website pages	Tenant-scoped; cloud embedding OK
`published_content`	Blog posts and social posts published through the platform	Tenant-scoped; cloud embedding OK; auto-populated
`competitor_content`	Competitor website pages and research data gathered by Content Researcher	Tenant-scoped; local embedding only — never sent to cloud
`channel_insights`	Human-accepted insight observations from channel analysis runs	Tenant-scoped; cloud embedding OK; auto-populated on acceptance

Embedding models

Dataset	Default	Override
`client_docs`	`text-embedding-3-small` (OpenAI)	Tenant may switch to local Ollama
`website_content`	`text-embedding-3-small` (OpenAI)	Tenant may switch to local Ollama
`published_content`	`text-embedding-3-small` (OpenAI)	Tenant may switch to local Ollama
`competitor_content`	`nomic-embed-text` (Ollama)	Always local — no override permitted

Enterprise on-prem tenants (dataPrivacyLevel: 'local_only') use local Ollama embedding for all four datasets.

The embedding model for a dataset cannot be changed after creation. Changing it would invalidate all existing vectors. To switch models, the tenant must create a new dataset and re-upload their files.

Ingestion Pipeline

Trigger sources

Trigger	Target dataset	Timing
Onboarding wizard — file upload (step 3c)	`client_docs`	Immediately after wizard submission
Onboarding wizard — website URL entered (step 3a)	`website_content`	Background job after wizard completes
Settings → Knowledge Base — manual upload	`client_docs`	On upload
Settings → Knowledge Base — re-crawl	`website_content`	On demand + scheduled weekly
Blog post client-approved (`POST /tenant/v1/blog/:id/client-approve`)	`published_content`	Immediately on client approval
Social post client-approved (`POST /tenant/v1/social/:id/client-approve`)	`published_content`	Immediately on client approval
Content Researcher agent — competitor scrape	`competitor_content`	During agent execution
Admin backfill (`POST /admin/v1/tenants/:tenantId/reingest-published`)	`published_content`	On demand — skips already-indexed posts
Human accepts an insight item (`POST /tenant/v1/insights/:insightId/accept`)	`channel_insights`	Immediately on acceptance

File ingestion flow


File received (multipart upload via API)
         │
         ▼
Saved to storage (local volume / S3-compatible)
rag_files record created — status: 'pending'
         │
         ▼
BullMQ job enqueued → queue: 'rag:ingestion'
{ type: 'file', fileId, tenantId, datasetId }
         │
         ▼
Worker picks up job
  ├── Parse text from file
  │     PDF  → pdf-parse (NODE_NATIVE) or Docling (DOCLING)
  │     DOCX → mammoth
  │     TXT / MD → read as-is
  │     CSV  → csv-parse
  │
  ├── Chunk text
  │     NAIVE    → fixed-size (default: 512 tokens, 64 overlap)
  │     MARKDOWN → split on headings (H1/H2 boundaries)
  │     MANUAL   → split on --- delimiter
  │
  ├── Embed each chunk
  │     OpenAI → POST /v1/embeddings
  │     Ollama → POST /api/embeddings
  │     Batched: 100 chunks per API call
  │
  └── Upsert vectors into Qdrant
        Collection: ds_{dataset.refId}
        Payload: { fileId, tenantId, chunkIndex, content, fileName, enabled, source }

Update rag_files: status → 'indexed', chunksCount = N
Update rag_datasets: totalChunks += N
SSE event pushed to Dashboard: file status updated

Website crawl flow

Triggered after onboarding completes (and on-demand re-crawl from Settings → Knowledge Base):


BullMQ job enqueued → queue: 'rag:ingestion'
{ type: 'crawl', crawlJobId, tenantId, datasetId }
         │
         ▼
Crawler worker (Playwright)
  ├── Fetch page → extract body text (strip nav, footer, cookie banners)
  ├── Extract same-domain links
  ├── Respect robots.txt
  ├── Skip: /login /admin /cart /checkout /wp-admin and URL patterns with ? params
  ├── Create rag_files record for the page (source: 'website_crawl')
  ├── Enqueue type: 'content' ingestion job for the page text
  ├── Update rag_crawl_jobs: pagesCrawled++
  └── Follow links → repeat up to maxDepth (3) and maxPages (200)

Update rag_crawl_jobs: status → 'completed'
SSE event pushed to Dashboard: crawl progress

Scheduled re-crawl frequency: weekly (BullMQ cron per tenant, runs Sunday at 3 AM tenant local time).

Published content feedback loop

When a blog post or social post is client-approved through the platform, it is automatically indexed:


Publishing worker → WordPress API / LinkedIn API / etc.
         │ publish succeeds
         ▼
BullMQ job enqueued → queue: 'rag:ingestion'
{ type: 'content', content: postMarkdown, fileId, tenantId, datasetId: 'published_content' }
         │
         ▼
Worker skips parse step (text already extracted)
Chunk → embed → upsert into Qdrant

Over time the published_content dataset accumulates the client’s complete publishing history — style, tone, topic coverage — queryable by future Copywriter, Social Media Manager, and SEO Specialist runs.

Agent Tool: `rag_search`

Tool definition


name: rag_search

description:
  Search the client's knowledge base for specific information. Use this when you need details
  not present in your injected context — specific product specs, pricing, past blog examples,
  competitor analysis, or full website page content. Returns the top matching text chunks.

inputs:
  query    (string, required)  — Natural language query
  dataset  (string, required)  — One of: client_docs | website_content | published_content | competitor_content
  topK     (number, optional)  — Chunks to return. Default: 5. Max: 20.

Invocation flow


Agent emits tool call:
  { "tool": "rag_search", "input": { "query": "what are our pricing tiers", "dataset": "website_content" } }
         │
         ▼
Tool dispatcher checks: rag_search in agent's toolNames[]?
         │ yes
         ▼
Privacy guard: competitor_content only permitted for content_researcher role
         │
         ▼
features/feature-search → searchService.search({ tenantId, datasetId, query, topK })
         │
         ▼
Hybrid search executes:
  1. Embed query using dataset's configured embedding model
  2. Vector search in Qdrant (cosine similarity) → top topK * 2 candidates
  3. Keyword search in Qdrant (text payload index, BM25) → top topK * 2 candidates
  4. Score normalisation: score ÷ max(scores) → [0, 1] per list
  5. Reciprocal Rank Fusion (k=60): rrfScore = 1 / (60 + rank)
  6. Final score: (vectorWeight × vectorRRF) + ((1 − vectorWeight) × keywordRRF)
  7. If reranking enabled: cross-encoder re-scores top topK*2 → sort → take top topK
         │
         ▼
Return to agent:
  [{ text, score, source: fileName, dataset }, ...]
         │
         ▼
Log to tool_calls table: tool, query, dataset, topK, chunksReturned, latencyMs

Search defaults per agent role

Datasets available to each role are controlled by allowedAgentRoles on each RagDataset, configured in packages/feature-knowledge/src/knowledge.types.ts (STANDARD_DATASETS).

Agent role (queue name)	Datasets permitted	Mechanism
`blog-writer`	`client_docs`, `website_content`, `published_content`	Pre-loaded
`strategy-writer`	`client_docs`, `website_content`, `published_content`, `competitor_content`, `channel_insights`	Pre-loaded
`social-post-writer`	`client_docs`, `published_content`	Pre-loaded
`social-calendar-planner`	`client_docs`, `published_content`	Pre-loaded
`keyword-researcher`	`client_docs`, `website_content`	Pre-loaded
`content-brief-writer`	`client_docs`, `website_content`	Pre-loaded
`email-writer`, `landing-page-writer`, `google-ads-writer`, `meta-ads-writer`	`client_docs`	Pre-loaded
`report-writer`, `ads-analyst`	`client_docs`, `published_content`, `channel_insights`	Pre-loaded
`site-auditor`, `backlink-researcher`	`website_content`	Pre-loaded
`client-researcher`, `competitor-researcher`, `context-file-writer`	All tenant datasets (no role filter)	Skill script
All 8 `*-insights` workers	`channel_insights` (prior accepted learnings for the same channel type)	Pre-loaded

competitor_content uses local Ollama embedding (useLocalEmbedding: true) — competitor data never reaches a cloud embedding API.

Query Patterns by Agent

Illustrative examples of how each agent uses the rag_search tool:

Copywriter

“What tone and style have we used in past blog posts for this client?” → published_content
“What are the exact features and benefits of their product?” → client_docs
“Find the pricing page so I reference the correct tier names” → website_content
“Do we have past email campaigns I can use for style reference?” → published_content

SEO Specialist

“What topics are already covered on the client’s blog?” → published_content
“What does the About page say about their positioning?” → website_content
“Are there brand guidelines for how to write SEO metadata?” → client_docs

Social Media Manager

“Show me past social posts — what language does this brand use?” → published_content
“Find the brand voice section from the uploaded guidelines” → client_docs

Activity Planner

“What deliverables have we produced for this client before?” → published_content
“What does the client say about their target audience?” → website_content

Content Researcher (Ollama only)

“What blog topics is [competitor] writing about?” → competitor_content
“How does [competitor] describe their pricing?” → competitor_content
“What keywords does [competitor] emphasise on their homepage?” → competitor_content

Competitor Data — Privacy Requirements

The competitor_content dataset has strict privacy requirements because scraping and storing competitor content could raise legal or competitive concerns if that data were sent to external services.

Requirements:

The Content Researcher agent must run on local Ollama only — never Claude or OpenAI.
The competitor_content dataset must use a local Ollama embedding model — never OpenAI.
Qdrant runs locally in Docker — vectors never leave the machine.
The competitor_content dataset must not be accessible to any agent role other than Content Researcher.
These constraints are enforced in feature-search and cannot be overridden by per-tenant configuration.

Integration with Onboarding

The onboarding wizard’s Step 3c (“Upload Company Docs”) serves two parallel purposes after the wizard completes:

Purpose	Mechanism	Timing
Generate Client Context File	LLM summarises uploaded docs + wizard inputs into a structured Markdown skill	Synchronous — blocks the “Setup Complete” screen (~30s)
Index docs into RAG	Files chunked, embedded, stored in `client_docs` dataset	Async BullMQ job — ~2–5 min per file
Crawl website	Playwright crawls the URL entered in step 3a	Async BullMQ job — ~5–15 min

Agents can start working immediately after the context file is generated. RAG results improve incrementally as indexing completes in the background.

A RAG readiness indicator is shown on the Knowledge Base overview screen while indexing is in progress:


Client Docs       ████████████████░░░  72%  14 of 19 chunks indexed
Website Content   ░░░░░░░░░░░░░░░░░░░   0%  Crawl in progress (page 23 / 200)
Published Content ─  (grows as campaigns produce output)
Competitor Data   ─  (populated by Content Researcher during research tasks)

Environment Variables


# Qdrant vector store
QDRANT_URL=http://qdrant:6333
QDRANT_API_KEY=                             # empty for local Docker
 
# RAG worker
RAG_WORKER_CONCURRENCY=5
RAG_UPLOAD_DIR=./uploads/rag               # relative to apps/api
 
# Embedding models
RAG_DEFAULT_EMBEDDING_PROVIDER=openai
RAG_DEFAULT_EMBEDDING_MODEL=text-embedding-3-small
RAG_LOCAL_EMBEDDING_MODEL=nomic-embed-text  # Ollama model — used for competitor_content and local-only tenants
 
# Reranking (optional — requires Ollama)
RAG_RERANKER_MODEL=BAAI/bge-reranker-v2-m3
RAG_RERANKER_URL=http://ollama:11434
 
# Docling high-accuracy parser (optional)
DOCLING_URL=                                # empty = disabled; set to http://docling:5001 to enable

Plan Availability

Feature	Free	Pro	Agency	Enterprise
Client docs upload	✅ (max 5 files)	✅ (max 50 files)	✅ unlimited	✅ unlimited
Website crawl	✅ (max 50 pages)	✅ (max 500 pages)	✅ (max 2,000 pages)	✅ configurable
Published content indexing	✅	✅	✅	✅
Competitor research (Ollama)	❌	✅	✅	✅
Reranking	❌	✅	✅	✅
Local-only embedding (privacy mode)	❌	❌	❌	✅
Docling parser	❌	❌	✅	✅