Screens — Knowledge Base (RAG Management)

Audience: Tenant admins (Dashboard app) + DM Portal reviewers Purpose: Manage the RAG knowledge base — upload documents, monitor ingestion, configure chunking, test retrieval, and trigger website crawls. Platform: Web only (all screens)

Status: [To Build] — The Knowledge Base / RAG system is a new layer being added as part of the RAG integration. None of these screens exist in the current live app. All screens (KB1–KB6) are specified and ready to build.

Screen	Status
KB1 — Knowledge Base Overview	[To Build]
KB2 — Dataset File Management	[To Build]
KB3 — Upload File Modal	[To Build]
KB4 — Retrieval Sandbox	[To Build]
KB5 — Dataset Configuration	[To Build]
KB6 — Website Crawl Settings modal	[To Build]
Dashboard Settings sub-nav (Knowledge Base entry)	[To Build]
Activity tab “Retrieved context” section	[To Build]
DM Portal Activity Detail “Retrieved context”	[To Build]
Manage Tenant Detail Knowledge Base tab	[To Build] — see screens-manage.md M3

Related: RAG Integration | RAG Architecture

Where These Screens Live

App	Route	Who uses it
Dashboard	`/settings/knowledge-base`	Tenant admin — manage their own knowledge base
Dashboard	`/settings/knowledge-base/[datasetId]`	Tenant admin — per-dataset file management
Dashboard	`/settings/knowledge-base/[datasetId]/sandbox`	Tenant admin + DM reviewer — test retrieval
Dashboard	`/settings/knowledge-base/[datasetId]/configure`	Tenant admin — chunking + parser settings
DM Portal	`/activities/[id]` (existing)	DM reviewer — see RAG chunks used in an agent run
Manage	`/tenants/[id]` → Knowledge Base tab (existing)	Super admin — view tenant RAG stats

The Knowledge Base is a section of Settings, not a top-level nav item. It lives under the Settings sidebar section with its own sub-nav.

Screen KB1 — Knowledge Base Overview (`/settings/knowledge-base`)

Purpose: See all four standard datasets for this tenant, file counts, indexing status, and quick actions.


┌─────────────────────────────────────────────────────────────┐
│  Knowledge Base                                             │
│  Manage documents, website content, and research data       │
│  that agents use to improve their responses.                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  📄 Client Documents                                │    │
│  │  Brand guides, product sheets, tone-of-voice docs   │    │
│  │                                                     │    │
│  │  3 files · 29 chunks indexed                        │    │
│  │  ████████████████████░░  87%                        │    │
│  │  brand-guidelines.pdf  ✅                           │    │
│  │  tone-of-voice.docx    ✅                           │    │
│  │  q1-results.pdf        ⏳ indexing…                 │    │
│  │                                                     │    │
│  │  [+ Upload Docs]  [View All Files →]                │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  🌐 Website Content                                 │    │
│  │  Crawled pages from your website                    │    │
│  │                                                     │    │
│  │  142 pages · 1,840 chunks indexed                   │    │
│  │  Last crawl: Apr 1, 2026  ·  Next: Apr 8, 2026      │    │
│  │                                                     │    │
│  │  [Re-crawl Now]  [View Pages →]                     │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  📝 Published Content                               │    │
│  │  Blog posts and social posts published via platform │    │
│  │                                                     │    │
│  │  24 items · 312 chunks  ·  auto-updated             │    │
│  │  12 blog posts  ·  12 social posts                  │    │
│  │                                                     │    │
│  │  [View Content →]                                   │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  🔍 Competitor Research                             │    │
│  │  Competitor data gathered by Content Researcher     │    │
│  │  🔒 Privacy: local only — never sent to cloud       │    │
│  │                                                     │    │
│  │  3 competitors  ·  18 pages  ·  210 chunks          │    │
│  │  Zapier · Make.com · n8n                            │    │
│  │                                                     │    │
│  │  [View Data →]  [Clear]                             │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
│  [Test Retrieval (Sandbox) →]                               │
└─────────────────────────────────────────────────────────────┘

Details:

Each dataset card shows: name, description, item count, chunk count, and status.
Ingestion progress bar appears on any dataset with files currently being indexed. Auto-updates via SSE without page refresh.
Published Content card has no upload button — it’s auto-populated. Shows counts and a “View Content” link.
Competitor Research card has a privacy badge (“local only”). The DM team can clear competitor data if needed.
“Test Retrieval” link at the bottom navigates to the sandbox.
Responsive: On mobile, dataset cards stack vertically. Progress bars and file names are truncated.

Screen KB2 — Dataset File Management (`/settings/knowledge-base/[datasetId]`)

Purpose: View, upload, and manage all files in a specific dataset. Monitor ingestion status per file.

The screen has three tabs: Files, Sandbox, Configure.

Files tab (default)


┌─────────────────────────────────────────────────────────────┐
│  ← Knowledge Base  /  Client Documents                      │
│  [Files]  [Sandbox]  [Configure]                            │
├─────────────────────────────────────────────────────────────┤
│                                      [+ Upload Files]        │
│                                                             │
│  Name                   Size   Chunks   Status   Created    │
│  ─────────────────────────────────────────────────────────  │
│  brand-guidelines.pdf   2.1MB  14       ✅ Indexed  Apr 1  │
│  ▸ 14 chunks active  [●] Enable  [Sandbox]  [Delete]        │
│                                                             │
│  tone-of-voice.docx     0.4MB  6        ✅ Indexed  Apr 1  │
│  ▸ 6 chunks active   [●] Enable  [Sandbox]  [Delete]        │
│                                                             │
│  q1-results.pdf         3.8MB  —        ⏳ Indexing  Apr 3  │
│  ████████████░░░░░░░░░  55%  Embedding chunks…              │
│                                                             │
│  ┌─────────────────────────────────────────────────────────┐│
│  │  📎 Drop files here or click to upload                 ││
│  │  PDF, DOCX, TXT, MD · max 10MB each · up to 32 files   ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘

File row detail:

Element	Description
Status icon	✅ Indexed, ⏳ Indexing (with progress bar), ❌ Error, ⚫ Disabled
Chunk count	How many chunks are indexed in Qdrant for this file
Enable/Disable toggle	Flips `enabled` field — disabled chunks are excluded from all searches without being deleted
Sandbox shortcut	Opens the sandbox pre-filtered to this file
Delete	Deletes file + all its Qdrant vectors; requires confirmation
Progress bar	Shows during parsing + embedding (via SSE — updates live)

Upload Files flow:

Drag-drop or click to select files.
Upload File modal appears (Screen KB3).
Files upload → rag_files records created with status: 'pending'.
Ingestion jobs enqueued to BullMQ.
File rows appear immediately with ⏳ Indexing and a live progress bar.
Progress bar updates via SSE as the worker processes each chunk batch.
Status becomes ✅ Indexed when complete.

Triggered by: ”+ Upload Files” button on the Files tab.


┌──────────────────────────────────────────────────┐
│  Upload Files to Client Documents          [✕]   │
├──────────────────────────────────────────────────┤
│                                                  │
│  ┌────────────────────────────────────────────┐  │
│  │  📎 Drop files here or click to browse    │  │
│  │                                            │  │
│  │  brand-guidelines.pdf  ✅  2.1 MB          │  │
│  │  tone-of-voice.docx    ✅  0.4 MB          │  │
│  │  large-report.pdf      ❌  14.2 MB — too large
│  └────────────────────────────────────────────┘  │
│                                                  │
│  Supported: PDF, DOCX, TXT, MD                   │
│  Max 10 MB per file · Up to 32 files at once     │
│                                                  │
│  ─────────────────────────────────────────────   │
│  Parse on upload                          [ON]   │
│  Start indexing immediately after upload         │
│                                                  │
│  Parser engine                                   │
│  ● Built-in (fast, general purpose)              │
│  ○ Docling  (layout-aware, best for PDFs)        │
│     ⚠ Requires Docling service to be configured  │
│                                                  │
│             [Cancel]  [Upload 2 Files →]         │
└──────────────────────────────────────────────────┘

Fields:

Field	Description
Drop zone	Multi-file. Shows file names + sizes. Red error for oversized / unsupported files.
Parse on upload	ON by default. If OFF, files upload but ingestion does not start — tenant can trigger later.
Parser engine	Built-in (Node.js pdf-parse / mammoth / fs.readFile). Docling only shown if `DOCLING_URL` is configured.

Screen KB4 — Retrieval Sandbox (`/settings/knowledge-base/[datasetId]/sandbox`)

Also accessible from the overview as a global sandbox (queries all datasets).

Purpose: Test what the agent will retrieve before running campaigns. Adjust search parameters to optimise relevance.


┌─────────────────────────────────────────────────────────────┐
│  ← Client Documents  /  Sandbox                             │
│  [Files]  [Sandbox]  [Configure]                            │
├────────────────────────┬────────────────────────────────────┤
│                        │                                    │
│  Settings              │  Results                           │
│  ──────────────────    │  ──────────────────────────────    │
│  Dataset               │  Query:                            │
│  [Client Documents ▾]  │  What are our key product USPs?    │
│                        │  [Search]                          │
│  Search scope          │                                    │
│  ● This dataset only   │  4 results · 0.31s                 │
│  ○ All datasets        │  ─────────────────────────────     │
│                        │  ▸ Score 0.94 · brand-guidelines   │
│  TopK                  │    "Our four USPs: no-code setup,  │
│  [────●────] 5         │     AI-powered suggestions, 200+   │
│                        │     integrations, SOC2 certified…" │
│  Vector weight         │                                    │
│  Keyword ●────── Vector│  ▸ Score 0.87 · pricing.html       │
│  [0.4]        [0.6]    │    "Why choose us: Setup in 30     │
│                        │     minutes, no IT team needed…"   │
│  Similarity threshold  │                                    │
│  [────●────] 0.1       │  ▸ Score 0.81 · website_content    │
│                        │    "Compare: Acme vs Zapier — we   │
│  Reranker              │     offer 40% lower cost at scale" │
│  [None ▾]              │                                    │
│                        │  ▸ Score 0.71 · tone-of-voice.docx │
│                        │    "Always lead with outcomes,     │
│  ────────────────────  │     not features. Speak to the     │
│  ⚙ Advanced            │     decision maker's time…"        │
│  Filter by source:     │                                    │
│  ☑ upload              │                                    │
│  ☑ website_crawl       │                                    │
│  ☑ published_content   │                                    │
│                        │                                    │
└────────────────────────┴────────────────────────────────────┘

Controls:

Control	Description
Dataset	Switch between individual datasets or search all
Search scope	This dataset vs all tenant datasets
TopK slider	1–20 results
Vector weight slider	0.0 (keyword only) → 1.0 (vector only). Default 0.6
Similarity threshold	Minimum score to include a result. Default 0.1
Reranker	Optional cross-encoder model. None by default.
Source filter	Filter results by file source type

Result card:

Each result shows:

Relevance score (0–1)
Source file name + dataset
Full chunk text (expandable if > 300 chars)
Chunk metadata (page number, section heading if available)

Responsive: On tablet/mobile, settings panel collapses to a “Settings” button that opens a sheet drawer. Results are shown full-width.

Screen KB5 — Dataset Configuration (`/settings/knowledge-base/[datasetId]/configure`)

Purpose: Adjust chunking strategy and parser settings for a dataset. These settings apply to all future uploads and re-ingestion.


┌─────────────────────────────────────────────────────────────┐
│  ← Client Documents  /  Configure                          │
│  [Files]  [Sandbox]  [Configure]                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Basic Details                                              │
│  ─────────────────────────────────────────────────────────  │
│  Name                                                       │
│  [Client Documents                                       ]  │
│                                                             │
│  Description                                                │
│  [Brand guides, product sheets, tone-of-voice docs       ]  │
│                                                             │
│  ─────────────────────────────────────────────────────────  │
│  Embedding Model                                            │
│  text-embedding-3-small (OpenAI)                            │
│  ⚠ Cannot be changed after dataset creation.               │
│    Changing the model invalidates all existing vectors.     │
│    Create a new dataset to use a different model.           │
│                                                             │
│  ─────────────────────────────────────────────────────────  │
│  Chunking                                                   │
│                                                             │
│  Parse type                                                 │
│  ● NAIVE — Fixed-size character chunking (recommended)      │
│  ○ MARKDOWN — Split on headings (best for .md / blogs)      │
│  ○ MANUAL — Split on --- delimiter                          │
│                                                             │
│  Chunk size (tokens)                                        │
│  [──────────────●─────] 512                                 │
│                                                             │
│  Chunk overlap (tokens)                                     │
│  [──────●───────────── ] 64                                 │
│                                                             │
│  ─────────────────────────────────────────────────────────  │
│  Parser Engine                                              │
│  ● Built-in  (fast, works for most documents)               │
│  ○ Docling   (layout-aware PDF parsing — requires sidecar)  │
│                                                             │
│  ─────────────────────────────────────────────────────────  │
│  Danger Zone                                                │
│  [Re-index all files]  — Re-runs ingestion with new settings│
│  [Delete dataset]      — Deletes dataset + all vectors      │
└─────────────────────────────────────────────────────────────┘

Notes:

Embedding model is read-only — cannot be changed after creation. A warning banner explains why.
Re-index all files re-runs ingestion for all indexed files with the new chunk settings. Existing Qdrant vectors are deleted and regenerated. A confirmation dialog shows estimated time.
Delete dataset requires typing the dataset name to confirm.

Triggered by: “Re-crawl Now” or settings gear on the Website Content card.


┌──────────────────────────────────────────────────┐
│  Website Crawl Settings                    [✕]   │
├──────────────────────────────────────────────────┤
│                                                  │
│  Start URL                                       │
│  [https://acmecorp.com                        ]  │
│                                                  │
│  Crawl scope (URL path prefix)                   │
│  [Leave empty to crawl entire site            ]  │
│  e.g. /blog to crawl only the blog section       │
│                                                  │
│  Max pages                                       │
│  [──────────────●─────────] 200                  │
│                                                  │
│  Max depth                                       │
│  [──────────●──────────── ] 3                    │
│                                                  │
│  Schedule                                        │
│  ● Weekly (every Monday at 3am)                  │
│  ○ Monthly                                       │
│  ○ Manual only                                   │
│                                                  │
│  Previously crawled: 142 pages (Apr 1, 2026)     │
│  This crawl will replace all existing pages.     │
│                                                  │
│         [Cancel]  [Start Crawl →]                │
└──────────────────────────────────────────────────┘

While a crawl is running, the Website Content card shows a live progress bar:


🌐 Website Content — Crawling…
████████████░░░░░░░░░░  57%   114 / 200 pages

RAG Usage in Existing Screens

Campaign Detail — Activity tab (Dashboard D4)

When an agent activity used rag_search, the activity card shows a collapsible “Retrieved context” section:


🤖 Copywriter — Writing blog post: "Why Local SEO Matters"
Status: ✅ Completed · Cost: $0.012 · 3m 40s

▸ Retrieved context (3 queries)

  Query 1: "local SEO for small businesses client services"
  Dataset: client_docs · 3 chunks · Score: 0.91, 0.87, 0.82
  ↳ brand-guidelines.pdf — "Our primary audience is local businesses…"

  Query 2: "past blog posts about SEO"
  Dataset: published_content · 3 chunks · Score: 0.89, 0.85, 0.78
  ↳ "How to Rank on Google in 2026" — "When writing about SEO topics…"

  Query 3: "product features for SME clients"
  Dataset: website_content · 2 chunks · Score: 0.93, 0.76
  ↳ pricing.html — "Starter plan: ideal for businesses under 50 employees…"

This gives the DM reviewer full visibility into what context the agent used — and can flag if the RAG results were poor (which would indicate a re-crawl or re-index is needed).

Activity Detail — DM Portal (P3)

Same “Retrieved context” section shown in the right metadata panel alongside cost, tokens, and model info.

Manage App — Tenant Knowledge Base Tab (M3 update)

The Tenant Detail screen (/tenants/[id]) adds a Knowledge Base tab alongside Overview, Config, Users, Agents, Billing:


Knowledge Base
──────────────────────────────────────────────────────────────
Dataset              Files   Chunks   Embedding Model   Status
──────────────────────────────────────────────────────────────
Client Documents     3       29       text-embed-3-small  ✅
Website Content      142p    1,840    text-embed-3-small  ✅ (weekly crawl)
Published Content    24      312      text-embed-3-small  ✅ (auto)
Competitor Research  18      210      nomic-embed-text    ✅ (local)
──────────────────────────────────────────────────────────────
Total vectors in Qdrant: 2,391   Qdrant collection size: 4.2 MB

Super admins can trigger a re-index or clear a dataset on behalf of a tenant (with audit log entry).

Add Knowledge Base to the Dashboard Settings sub-nav:


Settings
├── General              /settings
├── Knowledge Base       /settings/knowledge-base   ← NEW
├── Channels             /settings/channels  (was /channels)
├── Integrations         /settings/integrations
├── Skills               /settings/skills
├── Recurring Tasks      /settings/recurring-tasks
├── Audit Log            /settings/audit-log
└── Billing              /settings/billing

Screens Reference Index

Screen	Route	App
KB1 — Overview	`/settings/knowledge-base`	Dashboard
KB2 — Files tab	`/settings/knowledge-base/[datasetId]`	Dashboard
KB3 — Upload modal	(modal on KB2)	Dashboard
KB4 — Sandbox tab	`/settings/knowledge-base/[datasetId]/sandbox`	Dashboard
KB5 — Configure tab	`/settings/knowledge-base/[datasetId]/configure`	Dashboard
KB6 — Crawl settings	(modal on KB1 / KB2)	Dashboard

Screens — Knowledge Base (RAG Management)

Where These Screens Live

Screen KB1 — Knowledge Base Overview (/settings/knowledge-base)

Screen KB2 — Dataset File Management (/settings/knowledge-base/[datasetId])

Files tab (default)

Screen KB3 — Upload File Modal

Screen KB4 — Retrieval Sandbox (/settings/knowledge-base/[datasetId]/sandbox)

Screen KB5 — Dataset Configuration (/settings/knowledge-base/[datasetId]/configure)

Screen KB6 — Website Crawl Settings (modal)