RAG Integration
Overview
The RAG (Retrieval-Augmented Generation) system gives agents on-demand access to client-specific knowledge that is too large, too detailed, or too dynamic to fit in a static skill file. It is complementary to the skills system — skills inject a curated summary when the agent starts; RAG lets the agent retrieve specific facts at the exact moment it needs them during execution.
Technical design: rag-architecture.md — Prisma schema, Qdrant collection design, BullMQ ingestion pipeline, hybrid search algorithm, tool implementation. UI screens: screens-knowledge-base.md — KB1 through KB6. Related: Skills System | Tool Integration Layer | Onboarding
The Two-Layer Context Strategy
Every agent run for a tenant receives context in two layers:
| Layer | Mechanism | Content | When used |
|---|---|---|---|
| Skills (static) | --add-dir / system prompt prepend | Client context file — company summary, tone, products, competitors, USPs | Always — injected at agent start for every task |
| RAG (dynamic) | Pre-loaded by worker code OR search_knowledge.js tool call | Specific documents, full website pages, past campaign copy, competitor detail | Worker injects before Claude starts (pre-loaded), or Claude calls a Bash tool (skill script) |
The skills layer answers “Who is this client?” The RAG layer answers “What exactly does the client’s pricing page say?” or “How did we write the last three blog posts for this client?”
RAG delivery mechanisms
There are two distinct ways agents receive RAG context in this codebase:
| Mechanism | How it works | Agents using it |
|---|---|---|
| Pre-loaded | Worker calls search() before building the Claude prompt. Results injected as KNOWLEDGE BASE CONTEXT section. Claude receives context passively — no tool call. | blog-writer, strategy-writer, all roles in content.worker.ts (18 roles) |
| Skill script | createSkillsDir() writes search_knowledge.js + CLAUDE.md with mandatory instruction to call the script before writing. Claude executes node search_knowledge.js "query" as a Bash tool call and reads the JSON results. | client-researcher, competitor-researcher, context-file-writer |
Why not put everything in the skill file?
| Reason | Detail |
|---|---|
| Context window limits | A client with 40 uploaded docs, a 200-page website, and 12 months of published content cannot fit in any model’s context window |
| Token cost | Injecting all content on every task wastes money — RAG retrieves only the relevant chunks |
| Freshness | Skill files are updated manually; RAG datasets update continuously as content is published, the site is re-crawled, or documents are uploaded |
Implementation Approach
The RAG system is implemented natively inside Leadmetrics — not as a separate sidecar service. New packages added to the monorepo:
| Package | Purpose |
|---|---|
providers/provider-qdrant | Qdrant client singleton, collection helpers |
features/feature-knowledge | Dataset and file management, ingestion queue |
features/feature-search | Hybrid search (vector + keyword + RRF), reranking |
New routes in apps/api and Knowledge Base screens in apps/dashboard. Full details in rag-architecture.md.
Datasets
Each tenant has four standard datasets, created automatically when the tenant is provisioned.
| Dataset | Purpose | Privacy |
|---|---|---|
client_docs | Uploaded brand/product documents (PDF, DOCX, TXT, MD) | Tenant-scoped; cloud embedding OK |
website_content | Crawled website pages | Tenant-scoped; cloud embedding OK |
published_content | Blog posts and social posts published through the platform | Tenant-scoped; cloud embedding OK; auto-populated |
competitor_content | Competitor website pages and research data gathered by Content Researcher | Tenant-scoped; local embedding only — never sent to cloud |
channel_insights | Human-accepted insight observations from channel analysis runs | Tenant-scoped; cloud embedding OK; auto-populated on acceptance |
Embedding models
| Dataset | Default | Override |
|---|---|---|
client_docs | text-embedding-3-small (OpenAI) | Tenant may switch to local Ollama |
website_content | text-embedding-3-small (OpenAI) | Tenant may switch to local Ollama |
published_content | text-embedding-3-small (OpenAI) | Tenant may switch to local Ollama |
competitor_content | nomic-embed-text (Ollama) | Always local — no override permitted |
Enterprise on-prem tenants (dataPrivacyLevel: 'local_only') use local Ollama embedding for all four datasets.
The embedding model for a dataset cannot be changed after creation. Changing it would invalidate all existing vectors. To switch models, the tenant must create a new dataset and re-upload their files.
Ingestion Pipeline
Trigger sources
| Trigger | Target dataset | Timing |
|---|---|---|
| Onboarding wizard — file upload (step 3c) | client_docs | Immediately after wizard submission |
| Onboarding wizard — website URL entered (step 3a) | website_content | Background job after wizard completes |
| Settings → Knowledge Base — manual upload | client_docs | On upload |
| Settings → Knowledge Base — re-crawl | website_content | On demand + scheduled weekly |
Blog post client-approved (POST /tenant/v1/blog/:id/client-approve) | published_content | Immediately on client approval |
Social post client-approved (POST /tenant/v1/social/:id/client-approve) | published_content | Immediately on client approval |
| Content Researcher agent — competitor scrape | competitor_content | During agent execution |
Admin backfill (POST /admin/v1/tenants/:tenantId/reingest-published) | published_content | On demand — skips already-indexed posts |
Human accepts an insight item (POST /tenant/v1/insights/:insightId/accept) | channel_insights | Immediately on acceptance |
File ingestion flow
File received (multipart upload via API)
│
▼
Saved to storage (local volume / S3-compatible)
rag_files record created — status: 'pending'
│
▼
BullMQ job enqueued → queue: 'rag:ingestion'
{ type: 'file', fileId, tenantId, datasetId }
│
▼
Worker picks up job
├── Parse text from file
│ PDF → pdf-parse (NODE_NATIVE) or Docling (DOCLING)
│ DOCX → mammoth
│ TXT / MD → read as-is
│ CSV → csv-parse
│
├── Chunk text
│ NAIVE → fixed-size (default: 512 tokens, 64 overlap)
│ MARKDOWN → split on headings (H1/H2 boundaries)
│ MANUAL → split on --- delimiter
│
├── Embed each chunk
│ OpenAI → POST /v1/embeddings
│ Ollama → POST /api/embeddings
│ Batched: 100 chunks per API call
│
└── Upsert vectors into Qdrant
Collection: ds_{dataset.refId}
Payload: { fileId, tenantId, chunkIndex, content, fileName, enabled, source }
Update rag_files: status → 'indexed', chunksCount = N
Update rag_datasets: totalChunks += N
SSE event pushed to Dashboard: file status updatedWebsite crawl flow
Triggered after onboarding completes (and on-demand re-crawl from Settings → Knowledge Base):
BullMQ job enqueued → queue: 'rag:ingestion'
{ type: 'crawl', crawlJobId, tenantId, datasetId }
│
▼
Crawler worker (Playwright)
├── Fetch page → extract body text (strip nav, footer, cookie banners)
├── Extract same-domain links
├── Respect robots.txt
├── Skip: /login /admin /cart /checkout /wp-admin and URL patterns with ? params
├── Create rag_files record for the page (source: 'website_crawl')
├── Enqueue type: 'content' ingestion job for the page text
├── Update rag_crawl_jobs: pagesCrawled++
└── Follow links → repeat up to maxDepth (3) and maxPages (200)
Update rag_crawl_jobs: status → 'completed'
SSE event pushed to Dashboard: crawl progressScheduled re-crawl frequency: weekly (BullMQ cron per tenant, runs Sunday at 3 AM tenant local time).
Published content feedback loop
When a blog post or social post is client-approved through the platform, it is automatically indexed:
Publishing worker → WordPress API / LinkedIn API / etc.
│ publish succeeds
▼
BullMQ job enqueued → queue: 'rag:ingestion'
{ type: 'content', content: postMarkdown, fileId, tenantId, datasetId: 'published_content' }
│
▼
Worker skips parse step (text already extracted)
Chunk → embed → upsert into QdrantOver time the published_content dataset accumulates the client’s complete publishing history — style, tone, topic coverage — queryable by future Copywriter, Social Media Manager, and SEO Specialist runs.
Agent Tool: rag_search
Tool definition
name: rag_search
description:
Search the client's knowledge base for specific information. Use this when you need details
not present in your injected context — specific product specs, pricing, past blog examples,
competitor analysis, or full website page content. Returns the top matching text chunks.
inputs:
query (string, required) — Natural language query
dataset (string, required) — One of: client_docs | website_content | published_content | competitor_content
topK (number, optional) — Chunks to return. Default: 5. Max: 20.Invocation flow
Agent emits tool call:
{ "tool": "rag_search", "input": { "query": "what are our pricing tiers", "dataset": "website_content" } }
│
▼
Tool dispatcher checks: rag_search in agent's toolNames[]?
│ yes
▼
Privacy guard: competitor_content only permitted for content_researcher role
│
▼
features/feature-search → searchService.search({ tenantId, datasetId, query, topK })
│
▼
Hybrid search executes:
1. Embed query using dataset's configured embedding model
2. Vector search in Qdrant (cosine similarity) → top topK * 2 candidates
3. Keyword search in Qdrant (text payload index, BM25) → top topK * 2 candidates
4. Score normalisation: score ÷ max(scores) → [0, 1] per list
5. Reciprocal Rank Fusion (k=60): rrfScore = 1 / (60 + rank)
6. Final score: (vectorWeight × vectorRRF) + ((1 − vectorWeight) × keywordRRF)
7. If reranking enabled: cross-encoder re-scores top topK*2 → sort → take top topK
│
▼
Return to agent:
[{ text, score, source: fileName, dataset }, ...]
│
▼
Log to tool_calls table: tool, query, dataset, topK, chunksReturned, latencyMsSearch defaults per agent role
Datasets available to each role are controlled by allowedAgentRoles on each RagDataset, configured in packages/feature-knowledge/src/knowledge.types.ts (STANDARD_DATASETS).
| Agent role (queue name) | Datasets permitted | Mechanism |
|---|---|---|
blog-writer | client_docs, website_content, published_content | Pre-loaded |
strategy-writer | client_docs, website_content, published_content, competitor_content, channel_insights | Pre-loaded |
social-post-writer | client_docs, published_content | Pre-loaded |
social-calendar-planner | client_docs, published_content | Pre-loaded |
keyword-researcher | client_docs, website_content | Pre-loaded |
content-brief-writer | client_docs, website_content | Pre-loaded |
email-writer, landing-page-writer, google-ads-writer, meta-ads-writer | client_docs | Pre-loaded |
report-writer, ads-analyst | client_docs, published_content, channel_insights | Pre-loaded |
site-auditor, backlink-researcher | website_content | Pre-loaded |
client-researcher, competitor-researcher, context-file-writer | All tenant datasets (no role filter) | Skill script |
All 8 *-insights workers | channel_insights (prior accepted learnings for the same channel type) | Pre-loaded |
competitor_content uses local Ollama embedding (useLocalEmbedding: true) — competitor data never reaches a cloud embedding API.
Query Patterns by Agent
Illustrative examples of how each agent uses the rag_search tool:
Copywriter
- “What tone and style have we used in past blog posts for this client?” →
published_content - “What are the exact features and benefits of their product?” →
client_docs - “Find the pricing page so I reference the correct tier names” →
website_content - “Do we have past email campaigns I can use for style reference?” →
published_content
SEO Specialist
- “What topics are already covered on the client’s blog?” →
published_content - “What does the About page say about their positioning?” →
website_content - “Are there brand guidelines for how to write SEO metadata?” →
client_docs
Social Media Manager
- “Show me past social posts — what language does this brand use?” →
published_content - “Find the brand voice section from the uploaded guidelines” →
client_docs
Activity Planner
- “What deliverables have we produced for this client before?” →
published_content - “What does the client say about their target audience?” →
website_content
Content Researcher (Ollama only)
- “What blog topics is [competitor] writing about?” →
competitor_content - “How does [competitor] describe their pricing?” →
competitor_content - “What keywords does [competitor] emphasise on their homepage?” →
competitor_content
Competitor Data — Privacy Requirements
The competitor_content dataset has strict privacy requirements because scraping and storing competitor content could raise legal or competitive concerns if that data were sent to external services.
Requirements:
- The Content Researcher agent must run on local Ollama only — never Claude or OpenAI.
- The
competitor_contentdataset must use a local Ollama embedding model — never OpenAI. - Qdrant runs locally in Docker — vectors never leave the machine.
- The
competitor_contentdataset must not be accessible to any agent role other than Content Researcher. - These constraints are enforced in
feature-searchand cannot be overridden by per-tenant configuration.
Integration with Onboarding
The onboarding wizard’s Step 3c (“Upload Company Docs”) serves two parallel purposes after the wizard completes:
| Purpose | Mechanism | Timing |
|---|---|---|
| Generate Client Context File | LLM summarises uploaded docs + wizard inputs into a structured Markdown skill | Synchronous — blocks the “Setup Complete” screen (~30s) |
| Index docs into RAG | Files chunked, embedded, stored in client_docs dataset | Async BullMQ job — ~2–5 min per file |
| Crawl website | Playwright crawls the URL entered in step 3a | Async BullMQ job — ~5–15 min |
Agents can start working immediately after the context file is generated. RAG results improve incrementally as indexing completes in the background.
A RAG readiness indicator is shown on the Knowledge Base overview screen while indexing is in progress:
Client Docs ████████████████░░░ 72% 14 of 19 chunks indexed
Website Content ░░░░░░░░░░░░░░░░░░░ 0% Crawl in progress (page 23 / 200)
Published Content ─ (grows as campaigns produce output)
Competitor Data ─ (populated by Content Researcher during research tasks)Environment Variables
# Qdrant vector store
QDRANT_URL=http://qdrant:6333
QDRANT_API_KEY= # empty for local Docker
# RAG worker
RAG_WORKER_CONCURRENCY=5
RAG_UPLOAD_DIR=./uploads/rag # relative to apps/api
# Embedding models
RAG_DEFAULT_EMBEDDING_PROVIDER=openai
RAG_DEFAULT_EMBEDDING_MODEL=text-embedding-3-small
RAG_LOCAL_EMBEDDING_MODEL=nomic-embed-text # Ollama model — used for competitor_content and local-only tenants
# Reranking (optional — requires Ollama)
RAG_RERANKER_MODEL=BAAI/bge-reranker-v2-m3
RAG_RERANKER_URL=http://ollama:11434
# Docling high-accuracy parser (optional)
DOCLING_URL= # empty = disabled; set to http://docling:5001 to enablePlan Availability
| Feature | Free | Pro | Agency | Enterprise |
|---|---|---|---|---|
| Client docs upload | ✅ (max 5 files) | ✅ (max 50 files) | ✅ unlimited | ✅ unlimited |
| Website crawl | ✅ (max 50 pages) | ✅ (max 500 pages) | ✅ (max 2,000 pages) | ✅ configurable |
| Published content indexing | ✅ | ✅ | ✅ | ✅ |
| Competitor research (Ollama) | ❌ | ✅ | ✅ | ✅ |
| Reranking | ❌ | ✅ | ✅ | ✅ |
| Local-only embedding (privacy mode) | ❌ | ❌ | ❌ | ✅ |
| Docling parser | ❌ | ❌ | ✅ | ✅ |