RAG Engine: All Ingestion Jobs Fail — Missing OPENAI_API_KEY
Status: Fixed
Severity: Critical — entire tenant knowledge base empty; all agents fall back to context-file-only mode
Service: apps/servers/ragengine (rag-ingestion-worker)
First observed: 2026-05-05 23:35
Symptom
Immediately after a website crawl completes and enqueues 100+ pages for RAG ingestion, the ragengine log floods with:
ERROR [rag-ingestion-worker] RAG job failed
"type": "AuthenticationError",
Error: 401 You didn't provide an API key.
at Function.generate (openai/src/error.ts:76:14)
at OpenAI.makeStatusError (openai/src/core.ts:462:21)All RAG jobs fail immediately — the Qdrant website_content collection stays empty. Strategy writer and any agent using RAG context sees no results.
Root Cause
apps/servers/ragengine/.env is missing OPENAI_API_KEY. The ragengine uses openai@4.104.0 to generate text embeddings before indexing into Qdrant. Without the key the OpenAI client throws a 401 AuthenticationError on every job.
Current keys present in apps/servers/ragengine/.env:
NODE_ENV
DATABASE_URL
REDIS_URL
RAG_WORKER_CONCURRENCY
DO_SPACES_ENDPOINT
DO_SPACES_KEY
DO_SPACES_SECRET
DO_SPACES_BUCKET
DO_SPACES_REGION
DO_SPACES_CDN_ENDPOINT
AZURE_OPENAI_API_VERSION <- Azure version present but not used by the embeddings pathMissing:
OPENAI_API_KEY <- required for openai.embeddings.create()The AZURE_OPENAI_API_VERSION key is present (presumably for a future Azure embeddings migration) but the current ragengine code calls openai.embeddings.create() via the standard OpenAI client — not the Azure endpoint — so it requires the regular OPENAI_API_KEY.
Impact
- 100% of RAG ingestion jobs fail with authentication error.
website_contentQdrant collection remains empty after crawl.- Strategy writer, context file writer, and all agents that call the
/knowledge/searchRAG endpoint return no context vectors. - Agents degrade silently to context-file-only mode (no error surfaced to the user).
- Failed jobs exhaust their BullMQ retry budget and enter the
failedstate — they will not auto-retry once the key is added. A manual re-trigger is needed after the fix.
Fix
-
Add
OPENAI_API_KEY=sk-...toapps/servers/ragengine/.env. -
Restart the ragengine server:
# Kill the existing ragengine process, then: cd apps/servers/ragengine && pnpm dev -
Re-enqueue the failed RAG jobs. The simplest approach is to re-trigger the website crawl from the Brand Assets page — the crawler will re-enqueue all pages. Alternatively, flush the failed BullMQ jobs via the Redis CLI:
redis-cli DEL bull:rag__ingestion:failedthen retrigger ingestion from the UI.
Verification
After adding the key and restarting, the ragengine log should show:
INFO [rag-ingestion-worker] RAG job completed fileId: cma... chunks: 12instead of the 401 error. The Qdrant website_content collection will start populating.