Skip to Content
FeaturesRAG Integration

RAG Integration

Overview

The RAG (Retrieval-Augmented Generation) system gives agents on-demand access to client-specific knowledge that is too large, too detailed, or too dynamic to fit in a static skill file. It is complementary to the skills system — skills inject a curated summary when the agent starts; RAG lets the agent retrieve specific facts at the exact moment it needs them during execution.

Technical design: rag-architecture.md — Prisma schema, Qdrant collection design, BullMQ ingestion pipeline, hybrid search algorithm, tool implementation. UI screens: screens-knowledge-base.md — KB1 through KB6. Related: Skills System | Tool Integration Layer | Onboarding


The Two-Layer Context Strategy

Every agent run for a tenant receives context in two layers:

LayerMechanismContentWhen used
Skills (static)--add-dir / system prompt prependClient context file — company summary, tone, products, competitors, USPsAlways — injected at agent start for every task
RAG (dynamic)Pre-loaded by worker code OR search_knowledge.js tool callSpecific documents, full website pages, past campaign copy, competitor detailWorker injects before Claude starts (pre-loaded), or Claude calls a Bash tool (skill script)

The skills layer answers “Who is this client?” The RAG layer answers “What exactly does the client’s pricing page say?” or “How did we write the last three blog posts for this client?”

RAG delivery mechanisms

There are two distinct ways agents receive RAG context in this codebase:

MechanismHow it worksAgents using it
Pre-loadedWorker calls search() before building the Claude prompt. Results injected as KNOWLEDGE BASE CONTEXT section. Claude receives context passively — no tool call.blog-writer, strategy-writer, all roles in content.worker.ts (18 roles)
Skill scriptcreateSkillsDir() writes search_knowledge.js + CLAUDE.md with mandatory instruction to call the script before writing. Claude executes node search_knowledge.js "query" as a Bash tool call and reads the JSON results.client-researcher, competitor-researcher, context-file-writer

Why not put everything in the skill file?

ReasonDetail
Context window limitsA client with 40 uploaded docs, a 200-page website, and 12 months of published content cannot fit in any model’s context window
Token costInjecting all content on every task wastes money — RAG retrieves only the relevant chunks
FreshnessSkill files are updated manually; RAG datasets update continuously as content is published, the site is re-crawled, or documents are uploaded

Implementation Approach

The RAG system is implemented natively inside Leadmetrics — not as a separate sidecar service. New packages added to the monorepo:

PackagePurpose
providers/provider-qdrantQdrant client singleton, collection helpers
features/feature-knowledgeDataset and file management, ingestion queue
features/feature-searchHybrid search (vector + keyword + RRF), reranking

New routes in apps/api and Knowledge Base screens in apps/dashboard. Full details in rag-architecture.md.


Datasets

Each tenant has four standard datasets, created automatically when the tenant is provisioned.

DatasetPurposePrivacy
client_docsUploaded brand/product documents (PDF, DOCX, TXT, MD)Tenant-scoped; cloud embedding OK
website_contentCrawled website pagesTenant-scoped; cloud embedding OK
published_contentBlog posts and social posts published through the platformTenant-scoped; cloud embedding OK; auto-populated
competitor_contentCompetitor website pages and research data gathered by Content ResearcherTenant-scoped; local embedding only — never sent to cloud
channel_insightsHuman-accepted insight observations from channel analysis runsTenant-scoped; cloud embedding OK; auto-populated on acceptance

Embedding models

DatasetDefaultOverride
client_docstext-embedding-3-small (OpenAI)Tenant may switch to local Ollama
website_contenttext-embedding-3-small (OpenAI)Tenant may switch to local Ollama
published_contenttext-embedding-3-small (OpenAI)Tenant may switch to local Ollama
competitor_contentnomic-embed-text (Ollama)Always local — no override permitted

Enterprise on-prem tenants (dataPrivacyLevel: 'local_only') use local Ollama embedding for all four datasets.

The embedding model for a dataset cannot be changed after creation. Changing it would invalidate all existing vectors. To switch models, the tenant must create a new dataset and re-upload their files.


Ingestion Pipeline

Trigger sources

TriggerTarget datasetTiming
Onboarding wizard — file upload (step 3c)client_docsImmediately after wizard submission
Onboarding wizard — website URL entered (step 3a)website_contentBackground job after wizard completes
Settings → Knowledge Base — manual uploadclient_docsOn upload
Settings → Knowledge Base — re-crawlwebsite_contentOn demand + scheduled weekly
Blog post client-approved (POST /tenant/v1/blog/:id/client-approve)published_contentImmediately on client approval
Social post client-approved (POST /tenant/v1/social/:id/client-approve)published_contentImmediately on client approval
Content Researcher agent — competitor scrapecompetitor_contentDuring agent execution
Admin backfill (POST /admin/v1/tenants/:tenantId/reingest-published)published_contentOn demand — skips already-indexed posts
Human accepts an insight item (POST /tenant/v1/insights/:insightId/accept)channel_insightsImmediately on acceptance

File ingestion flow

File received (multipart upload via API) Saved to storage (local volume / S3-compatible) rag_files record created — status: 'pending' BullMQ job enqueued → queue: 'rag:ingestion' { type: 'file', fileId, tenantId, datasetId } Worker picks up job ├── Parse text from file │ PDF → pdf-parse (NODE_NATIVE) or Docling (DOCLING) │ DOCX → mammoth │ TXT / MD → read as-is │ CSV → csv-parse ├── Chunk text │ NAIVE → fixed-size (default: 512 tokens, 64 overlap) │ MARKDOWN → split on headings (H1/H2 boundaries) │ MANUAL → split on --- delimiter ├── Embed each chunk │ OpenAI → POST /v1/embeddings │ Ollama → POST /api/embeddings │ Batched: 100 chunks per API call └── Upsert vectors into Qdrant Collection: ds_{dataset.refId} Payload: { fileId, tenantId, chunkIndex, content, fileName, enabled, source } Update rag_files: status → 'indexed', chunksCount = N Update rag_datasets: totalChunks += N SSE event pushed to Dashboard: file status updated

Website crawl flow

Triggered after onboarding completes (and on-demand re-crawl from Settings → Knowledge Base):

BullMQ job enqueued → queue: 'rag:ingestion' { type: 'crawl', crawlJobId, tenantId, datasetId } Crawler worker (Playwright) ├── Fetch page → extract body text (strip nav, footer, cookie banners) ├── Extract same-domain links ├── Respect robots.txt ├── Skip: /login /admin /cart /checkout /wp-admin and URL patterns with ? params ├── Create rag_files record for the page (source: 'website_crawl') ├── Enqueue type: 'content' ingestion job for the page text ├── Update rag_crawl_jobs: pagesCrawled++ └── Follow links → repeat up to maxDepth (3) and maxPages (200) Update rag_crawl_jobs: status → 'completed' SSE event pushed to Dashboard: crawl progress

Scheduled re-crawl frequency: weekly (BullMQ cron per tenant, runs Sunday at 3 AM tenant local time).

Published content feedback loop

When a blog post or social post is client-approved through the platform, it is automatically indexed:

Publishing worker → WordPress API / LinkedIn API / etc. │ publish succeeds BullMQ job enqueued → queue: 'rag:ingestion' { type: 'content', content: postMarkdown, fileId, tenantId, datasetId: 'published_content' } Worker skips parse step (text already extracted) Chunk → embed → upsert into Qdrant

Over time the published_content dataset accumulates the client’s complete publishing history — style, tone, topic coverage — queryable by future Copywriter, Social Media Manager, and SEO Specialist runs.


Tool definition

name: rag_search description: Search the client's knowledge base for specific information. Use this when you need details not present in your injected context — specific product specs, pricing, past blog examples, competitor analysis, or full website page content. Returns the top matching text chunks. inputs: query (string, required) — Natural language query dataset (string, required) — One of: client_docs | website_content | published_content | competitor_content topK (number, optional) — Chunks to return. Default: 5. Max: 20.

Invocation flow

Agent emits tool call: { "tool": "rag_search", "input": { "query": "what are our pricing tiers", "dataset": "website_content" } } Tool dispatcher checks: rag_search in agent's toolNames[]? │ yes Privacy guard: competitor_content only permitted for content_researcher role features/feature-search → searchService.search({ tenantId, datasetId, query, topK }) Hybrid search executes: 1. Embed query using dataset's configured embedding model 2. Vector search in Qdrant (cosine similarity) → top topK * 2 candidates 3. Keyword search in Qdrant (text payload index, BM25) → top topK * 2 candidates 4. Score normalisation: score ÷ max(scores) → [0, 1] per list 5. Reciprocal Rank Fusion (k=60): rrfScore = 1 / (60 + rank) 6. Final score: (vectorWeight × vectorRRF) + ((1 − vectorWeight) × keywordRRF) 7. If reranking enabled: cross-encoder re-scores top topK*2 → sort → take top topK Return to agent: [{ text, score, source: fileName, dataset }, ...] Log to tool_calls table: tool, query, dataset, topK, chunksReturned, latencyMs

Search defaults per agent role

Datasets available to each role are controlled by allowedAgentRoles on each RagDataset, configured in packages/feature-knowledge/src/knowledge.types.ts (STANDARD_DATASETS).

Agent role (queue name)Datasets permittedMechanism
blog-writerclient_docs, website_content, published_contentPre-loaded
strategy-writerclient_docs, website_content, published_content, competitor_content, channel_insightsPre-loaded
social-post-writerclient_docs, published_contentPre-loaded
social-calendar-plannerclient_docs, published_contentPre-loaded
keyword-researcherclient_docs, website_contentPre-loaded
content-brief-writerclient_docs, website_contentPre-loaded
email-writer, landing-page-writer, google-ads-writer, meta-ads-writerclient_docsPre-loaded
report-writer, ads-analystclient_docs, published_content, channel_insightsPre-loaded
site-auditor, backlink-researcherwebsite_contentPre-loaded
client-researcher, competitor-researcher, context-file-writerAll tenant datasets (no role filter)Skill script
All 8 *-insights workerschannel_insights (prior accepted learnings for the same channel type)Pre-loaded

competitor_content uses local Ollama embedding (useLocalEmbedding: true) — competitor data never reaches a cloud embedding API.


Query Patterns by Agent

Illustrative examples of how each agent uses the rag_search tool:

Copywriter

  • “What tone and style have we used in past blog posts for this client?”published_content
  • “What are the exact features and benefits of their product?”client_docs
  • “Find the pricing page so I reference the correct tier names”website_content
  • “Do we have past email campaigns I can use for style reference?”published_content

SEO Specialist

  • “What topics are already covered on the client’s blog?”published_content
  • “What does the About page say about their positioning?”website_content
  • “Are there brand guidelines for how to write SEO metadata?”client_docs

Social Media Manager

  • “Show me past social posts — what language does this brand use?”published_content
  • “Find the brand voice section from the uploaded guidelines”client_docs

Activity Planner

  • “What deliverables have we produced for this client before?”published_content
  • “What does the client say about their target audience?”website_content

Content Researcher (Ollama only)

  • “What blog topics is [competitor] writing about?”competitor_content
  • “How does [competitor] describe their pricing?”competitor_content
  • “What keywords does [competitor] emphasise on their homepage?”competitor_content

Competitor Data — Privacy Requirements

The competitor_content dataset has strict privacy requirements because scraping and storing competitor content could raise legal or competitive concerns if that data were sent to external services.

Requirements:

  1. The Content Researcher agent must run on local Ollama only — never Claude or OpenAI.
  2. The competitor_content dataset must use a local Ollama embedding model — never OpenAI.
  3. Qdrant runs locally in Docker — vectors never leave the machine.
  4. The competitor_content dataset must not be accessible to any agent role other than Content Researcher.
  5. These constraints are enforced in feature-search and cannot be overridden by per-tenant configuration.

Integration with Onboarding

The onboarding wizard’s Step 3c (“Upload Company Docs”) serves two parallel purposes after the wizard completes:

PurposeMechanismTiming
Generate Client Context FileLLM summarises uploaded docs + wizard inputs into a structured Markdown skillSynchronous — blocks the “Setup Complete” screen (~30s)
Index docs into RAGFiles chunked, embedded, stored in client_docs datasetAsync BullMQ job — ~2–5 min per file
Crawl websitePlaywright crawls the URL entered in step 3aAsync BullMQ job — ~5–15 min

Agents can start working immediately after the context file is generated. RAG results improve incrementally as indexing completes in the background.

A RAG readiness indicator is shown on the Knowledge Base overview screen while indexing is in progress:

Client Docs ████████████████░░░ 72% 14 of 19 chunks indexed Website Content ░░░░░░░░░░░░░░░░░░░ 0% Crawl in progress (page 23 / 200) Published Content ─ (grows as campaigns produce output) Competitor Data ─ (populated by Content Researcher during research tasks)

Environment Variables

# Qdrant vector store QDRANT_URL=http://qdrant:6333 QDRANT_API_KEY= # empty for local Docker # RAG worker RAG_WORKER_CONCURRENCY=5 RAG_UPLOAD_DIR=./uploads/rag # relative to apps/api # Embedding models RAG_DEFAULT_EMBEDDING_PROVIDER=openai RAG_DEFAULT_EMBEDDING_MODEL=text-embedding-3-small RAG_LOCAL_EMBEDDING_MODEL=nomic-embed-text # Ollama model — used for competitor_content and local-only tenants # Reranking (optional — requires Ollama) RAG_RERANKER_MODEL=BAAI/bge-reranker-v2-m3 RAG_RERANKER_URL=http://ollama:11434 # Docling high-accuracy parser (optional) DOCLING_URL= # empty = disabled; set to http://docling:5001 to enable

Plan Availability

FeatureFreeProAgencyEnterprise
Client docs upload✅ (max 5 files)✅ (max 50 files)✅ unlimited✅ unlimited
Website crawl✅ (max 50 pages)✅ (max 500 pages)✅ (max 2,000 pages)✅ configurable
Published content indexing
Competitor research (Ollama)
Reranking
Local-only embedding (privacy mode)
Docling parser

© 2026 Leadmetrics — Internal use only