Screens — Knowledge Base (RAG Management)
Audience: Tenant admins (Dashboard app) + DM Portal reviewers Purpose: Manage the RAG knowledge base — upload documents, monitor ingestion, configure chunking, test retrieval, and trigger website crawls. Platform: Web only (all screens)
Status: [To Build] — The Knowledge Base / RAG system is a new layer being added as part of the RAG integration. None of these screens exist in the current live app. All screens (KB1–KB6) are specified and ready to build.
| Screen | Status |
|---|---|
| KB1 — Knowledge Base Overview | [To Build] |
| KB2 — Dataset File Management | [To Build] |
| KB3 — Upload File Modal | [To Build] |
| KB4 — Retrieval Sandbox | [To Build] |
| KB5 — Dataset Configuration | [To Build] |
| KB6 — Website Crawl Settings modal | [To Build] |
| Dashboard Settings sub-nav (Knowledge Base entry) | [To Build] |
| Activity tab “Retrieved context” section | [To Build] |
| DM Portal Activity Detail “Retrieved context” | [To Build] |
| Manage Tenant Detail Knowledge Base tab | [To Build] — see screens-manage.md M3 |
Related: RAG Integration | RAG Architecture
Where These Screens Live
| App | Route | Who uses it |
|---|---|---|
| Dashboard | /settings/knowledge-base | Tenant admin — manage their own knowledge base |
| Dashboard | /settings/knowledge-base/[datasetId] | Tenant admin — per-dataset file management |
| Dashboard | /settings/knowledge-base/[datasetId]/sandbox | Tenant admin + DM reviewer — test retrieval |
| Dashboard | /settings/knowledge-base/[datasetId]/configure | Tenant admin — chunking + parser settings |
| DM Portal | /activities/[id] (existing) | DM reviewer — see RAG chunks used in an agent run |
| Manage | /tenants/[id] → Knowledge Base tab (existing) | Super admin — view tenant RAG stats |
The Knowledge Base is a section of Settings, not a top-level nav item. It lives under the Settings sidebar section with its own sub-nav.
Screen KB1 — Knowledge Base Overview (/settings/knowledge-base)
Purpose: See all four standard datasets for this tenant, file counts, indexing status, and quick actions.
┌─────────────────────────────────────────────────────────────┐
│ Knowledge Base │
│ Manage documents, website content, and research data │
│ that agents use to improve their responses. │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ 📄 Client Documents │ │
│ │ Brand guides, product sheets, tone-of-voice docs │ │
│ │ │ │
│ │ 3 files · 29 chunks indexed │ │
│ │ ████████████████████░░ 87% │ │
│ │ brand-guidelines.pdf ✅ │ │
│ │ tone-of-voice.docx ✅ │ │
│ │ q1-results.pdf ⏳ indexing… │ │
│ │ │ │
│ │ [+ Upload Docs] [View All Files →] │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ 🌐 Website Content │ │
│ │ Crawled pages from your website │ │
│ │ │ │
│ │ 142 pages · 1,840 chunks indexed │ │
│ │ Last crawl: Apr 1, 2026 · Next: Apr 8, 2026 │ │
│ │ │ │
│ │ [Re-crawl Now] [View Pages →] │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ 📝 Published Content │ │
│ │ Blog posts and social posts published via platform │ │
│ │ │ │
│ │ 24 items · 312 chunks · auto-updated │ │
│ │ 12 blog posts · 12 social posts │ │
│ │ │ │
│ │ [View Content →] │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ 🔍 Competitor Research │ │
│ │ Competitor data gathered by Content Researcher │ │
│ │ 🔒 Privacy: local only — never sent to cloud │ │
│ │ │ │
│ │ 3 competitors · 18 pages · 210 chunks │ │
│ │ Zapier · Make.com · n8n │ │
│ │ │ │
│ │ [View Data →] [Clear] │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ [Test Retrieval (Sandbox) →] │
└─────────────────────────────────────────────────────────────┘Details:
- Each dataset card shows: name, description, item count, chunk count, and status.
- Ingestion progress bar appears on any dataset with files currently being indexed. Auto-updates via SSE without page refresh.
- Published Content card has no upload button — it’s auto-populated. Shows counts and a “View Content” link.
- Competitor Research card has a privacy badge (“local only”). The DM team can clear competitor data if needed.
- “Test Retrieval” link at the bottom navigates to the sandbox.
- Responsive: On mobile, dataset cards stack vertically. Progress bars and file names are truncated.
Screen KB2 — Dataset File Management (/settings/knowledge-base/[datasetId])
Purpose: View, upload, and manage all files in a specific dataset. Monitor ingestion status per file.
The screen has three tabs: Files, Sandbox, Configure.
Files tab (default)
┌─────────────────────────────────────────────────────────────┐
│ ← Knowledge Base / Client Documents │
│ [Files] [Sandbox] [Configure] │
├─────────────────────────────────────────────────────────────┤
│ [+ Upload Files] │
│ │
│ Name Size Chunks Status Created │
│ ───────────────────────────────────────────────────────── │
│ brand-guidelines.pdf 2.1MB 14 ✅ Indexed Apr 1 │
│ ▸ 14 chunks active [●] Enable [Sandbox] [Delete] │
│ │
│ tone-of-voice.docx 0.4MB 6 ✅ Indexed Apr 1 │
│ ▸ 6 chunks active [●] Enable [Sandbox] [Delete] │
│ │
│ q1-results.pdf 3.8MB — ⏳ Indexing Apr 3 │
│ ████████████░░░░░░░░░ 55% Embedding chunks… │
│ │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ 📎 Drop files here or click to upload ││
│ │ PDF, DOCX, TXT, MD · max 10MB each · up to 32 files ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘File row detail:
| Element | Description |
|---|---|
| Status icon | ✅ Indexed, ⏳ Indexing (with progress bar), ❌ Error, ⚫ Disabled |
| Chunk count | How many chunks are indexed in Qdrant for this file |
| Enable/Disable toggle | Flips enabled field — disabled chunks are excluded from all searches without being deleted |
| Sandbox shortcut | Opens the sandbox pre-filtered to this file |
| Delete | Deletes file + all its Qdrant vectors; requires confirmation |
| Progress bar | Shows during parsing + embedding (via SSE — updates live) |
Upload Files flow:
- Drag-drop or click to select files.
- Upload File modal appears (Screen KB3).
- Files upload →
rag_filesrecords created withstatus: 'pending'. - Ingestion jobs enqueued to BullMQ.
- File rows appear immediately with
⏳ Indexingand a live progress bar. - Progress bar updates via SSE as the worker processes each chunk batch.
- Status becomes
✅ Indexedwhen complete.
Screen KB3 — Upload File Modal
Triggered by: ”+ Upload Files” button on the Files tab.
┌──────────────────────────────────────────────────┐
│ Upload Files to Client Documents [✕] │
├──────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ 📎 Drop files here or click to browse │ │
│ │ │ │
│ │ brand-guidelines.pdf ✅ 2.1 MB │ │
│ │ tone-of-voice.docx ✅ 0.4 MB │ │
│ │ large-report.pdf ❌ 14.2 MB — too large
│ └────────────────────────────────────────────┘ │
│ │
│ Supported: PDF, DOCX, TXT, MD │
│ Max 10 MB per file · Up to 32 files at once │
│ │
│ ───────────────────────────────────────────── │
│ Parse on upload [ON] │
│ Start indexing immediately after upload │
│ │
│ Parser engine │
│ ● Built-in (fast, general purpose) │
│ ○ Docling (layout-aware, best for PDFs) │
│ ⚠ Requires Docling service to be configured │
│ │
│ [Cancel] [Upload 2 Files →] │
└──────────────────────────────────────────────────┘Fields:
| Field | Description |
|---|---|
| Drop zone | Multi-file. Shows file names + sizes. Red error for oversized / unsupported files. |
| Parse on upload | ON by default. If OFF, files upload but ingestion does not start — tenant can trigger later. |
| Parser engine | Built-in (Node.js pdf-parse / mammoth / fs.readFile). Docling only shown if DOCLING_URL is configured. |
Screen KB4 — Retrieval Sandbox (/settings/knowledge-base/[datasetId]/sandbox)
Also accessible from the overview as a global sandbox (queries all datasets).
Purpose: Test what the agent will retrieve before running campaigns. Adjust search parameters to optimise relevance.
┌─────────────────────────────────────────────────────────────┐
│ ← Client Documents / Sandbox │
│ [Files] [Sandbox] [Configure] │
├────────────────────────┬────────────────────────────────────┤
│ │ │
│ Settings │ Results │
│ ────────────────── │ ────────────────────────────── │
│ Dataset │ Query: │
│ [Client Documents ▾] │ What are our key product USPs? │
│ │ [Search] │
│ Search scope │ │
│ ● This dataset only │ 4 results · 0.31s │
│ ○ All datasets │ ───────────────────────────── │
│ │ ▸ Score 0.94 · brand-guidelines │
│ TopK │ "Our four USPs: no-code setup, │
│ [────●────] 5 │ AI-powered suggestions, 200+ │
│ │ integrations, SOC2 certified…" │
│ Vector weight │ │
│ Keyword ●────── Vector│ ▸ Score 0.87 · pricing.html │
│ [0.4] [0.6] │ "Why choose us: Setup in 30 │
│ │ minutes, no IT team needed…" │
│ Similarity threshold │ │
│ [────●────] 0.1 │ ▸ Score 0.81 · website_content │
│ │ "Compare: Acme vs Zapier — we │
│ Reranker │ offer 40% lower cost at scale" │
│ [None ▾] │ │
│ │ ▸ Score 0.71 · tone-of-voice.docx │
│ │ "Always lead with outcomes, │
│ ──────────────────── │ not features. Speak to the │
│ ⚙ Advanced │ decision maker's time…" │
│ Filter by source: │ │
│ ☑ upload │ │
│ ☑ website_crawl │ │
│ ☑ published_content │ │
│ │ │
└────────────────────────┴────────────────────────────────────┘Controls:
| Control | Description |
|---|---|
| Dataset | Switch between individual datasets or search all |
| Search scope | This dataset vs all tenant datasets |
| TopK slider | 1–20 results |
| Vector weight slider | 0.0 (keyword only) → 1.0 (vector only). Default 0.6 |
| Similarity threshold | Minimum score to include a result. Default 0.1 |
| Reranker | Optional cross-encoder model. None by default. |
| Source filter | Filter results by file source type |
Result card:
Each result shows:
- Relevance score (0–1)
- Source file name + dataset
- Full chunk text (expandable if > 300 chars)
- Chunk metadata (page number, section heading if available)
Responsive: On tablet/mobile, settings panel collapses to a “Settings” button that opens a sheet drawer. Results are shown full-width.
Screen KB5 — Dataset Configuration (/settings/knowledge-base/[datasetId]/configure)
Purpose: Adjust chunking strategy and parser settings for a dataset. These settings apply to all future uploads and re-ingestion.
┌─────────────────────────────────────────────────────────────┐
│ ← Client Documents / Configure │
│ [Files] [Sandbox] [Configure] │
├─────────────────────────────────────────────────────────────┤
│ │
│ Basic Details │
│ ───────────────────────────────────────────────────────── │
│ Name │
│ [Client Documents ] │
│ │
│ Description │
│ [Brand guides, product sheets, tone-of-voice docs ] │
│ │
│ ───────────────────────────────────────────────────────── │
│ Embedding Model │
│ text-embedding-3-small (OpenAI) │
│ ⚠ Cannot be changed after dataset creation. │
│ Changing the model invalidates all existing vectors. │
│ Create a new dataset to use a different model. │
│ │
│ ───────────────────────────────────────────────────────── │
│ Chunking │
│ │
│ Parse type │
│ ● NAIVE — Fixed-size character chunking (recommended) │
│ ○ MARKDOWN — Split on headings (best for .md / blogs) │
│ ○ MANUAL — Split on --- delimiter │
│ │
│ Chunk size (tokens) │
│ [──────────────●─────] 512 │
│ │
│ Chunk overlap (tokens) │
│ [──────●───────────── ] 64 │
│ │
│ ───────────────────────────────────────────────────────── │
│ Parser Engine │
│ ● Built-in (fast, works for most documents) │
│ ○ Docling (layout-aware PDF parsing — requires sidecar) │
│ │
│ ───────────────────────────────────────────────────────── │
│ Danger Zone │
│ [Re-index all files] — Re-runs ingestion with new settings│
│ [Delete dataset] — Deletes dataset + all vectors │
└─────────────────────────────────────────────────────────────┘Notes:
- Embedding model is read-only — cannot be changed after creation. A warning banner explains why.
- Re-index all files re-runs ingestion for all indexed files with the new chunk settings. Existing Qdrant vectors are deleted and regenerated. A confirmation dialog shows estimated time.
- Delete dataset requires typing the dataset name to confirm.
Screen KB6 — Website Crawl Settings (modal)
Triggered by: “Re-crawl Now” or settings gear on the Website Content card.
┌──────────────────────────────────────────────────┐
│ Website Crawl Settings [✕] │
├──────────────────────────────────────────────────┤
│ │
│ Start URL │
│ [https://acmecorp.com ] │
│ │
│ Crawl scope (URL path prefix) │
│ [Leave empty to crawl entire site ] │
│ e.g. /blog to crawl only the blog section │
│ │
│ Max pages │
│ [──────────────●─────────] 200 │
│ │
│ Max depth │
│ [──────────●──────────── ] 3 │
│ │
│ Schedule │
│ ● Weekly (every Monday at 3am) │
│ ○ Monthly │
│ ○ Manual only │
│ │
│ Previously crawled: 142 pages (Apr 1, 2026) │
│ This crawl will replace all existing pages. │
│ │
│ [Cancel] [Start Crawl →] │
└──────────────────────────────────────────────────┘While a crawl is running, the Website Content card shows a live progress bar:
🌐 Website Content — Crawling…
████████████░░░░░░░░░░ 57% 114 / 200 pagesRAG Usage in Existing Screens
Campaign Detail — Activity tab (Dashboard D4)
When an agent activity used rag_search, the activity card shows a collapsible “Retrieved context” section:
🤖 Copywriter — Writing blog post: "Why Local SEO Matters"
Status: ✅ Completed · Cost: $0.012 · 3m 40s
▸ Retrieved context (3 queries)
Query 1: "local SEO for small businesses client services"
Dataset: client_docs · 3 chunks · Score: 0.91, 0.87, 0.82
↳ brand-guidelines.pdf — "Our primary audience is local businesses…"
Query 2: "past blog posts about SEO"
Dataset: published_content · 3 chunks · Score: 0.89, 0.85, 0.78
↳ "How to Rank on Google in 2026" — "When writing about SEO topics…"
Query 3: "product features for SME clients"
Dataset: website_content · 2 chunks · Score: 0.93, 0.76
↳ pricing.html — "Starter plan: ideal for businesses under 50 employees…"This gives the DM reviewer full visibility into what context the agent used — and can flag if the RAG results were poor (which would indicate a re-crawl or re-index is needed).
Activity Detail — DM Portal (P3)
Same “Retrieved context” section shown in the right metadata panel alongside cost, tokens, and model info.
Manage App — Tenant Knowledge Base Tab (M3 update)
The Tenant Detail screen (/tenants/[id]) adds a Knowledge Base tab alongside Overview, Config, Users, Agents, Billing:
Knowledge Base
──────────────────────────────────────────────────────────────
Dataset Files Chunks Embedding Model Status
──────────────────────────────────────────────────────────────
Client Documents 3 29 text-embed-3-small ✅
Website Content 142p 1,840 text-embed-3-small ✅ (weekly crawl)
Published Content 24 312 text-embed-3-small ✅ (auto)
Competitor Research 18 210 nomic-embed-text ✅ (local)
──────────────────────────────────────────────────────────────
Total vectors in Qdrant: 2,391 Qdrant collection size: 4.2 MBSuper admins can trigger a re-index or clear a dataset on behalf of a tenant (with audit log entry).
Navigation Update
Add Knowledge Base to the Dashboard Settings sub-nav:
Settings
├── General /settings
├── Knowledge Base /settings/knowledge-base ← NEW
├── Channels /settings/channels (was /channels)
├── Integrations /settings/integrations
├── Skills /settings/skills
├── Recurring Tasks /settings/recurring-tasks
├── Audit Log /settings/audit-log
└── Billing /settings/billingScreens Reference Index
| Screen | Route | App |
|---|---|---|
| KB1 — Overview | /settings/knowledge-base | Dashboard |
| KB2 — Files tab | /settings/knowledge-base/[datasetId] | Dashboard |
| KB3 — Upload modal | (modal on KB2) | Dashboard |
| KB4 — Sandbox tab | /settings/knowledge-base/[datasetId]/sandbox | Dashboard |
| KB5 — Configure tab | /settings/knowledge-base/[datasetId]/configure | Dashboard |
| KB6 — Crawl settings | (modal on KB1 / KB2) | Dashboard |