Skip to Content
AgentsImprovementsGap 2: RAG Retrieval is Relevance-Only

Gap 2: RAG Retrieval is Relevance-Only

Problem

The RAG pipeline scores retrieved chunks purely by cosine similarity (relevance). There is no weighting for how recent the source is or how important it is relative to other content in the knowledge base.

From the Generative Agents paper (Park et al., 2023), a three-factor retrieval score produces significantly better context selection:

score = α·relevance + β·recency + γ·importance

Current behaviour

A stale competitor analysis from six months ago scores identically to one written last week. A core brand positioning document scores the same as an incidental blog post. A RAG chunk that was previously cited in a high-scoring output carries no advantage over one that has never been used.

The result is that prompts can be populated with outdated or low-value context despite better alternatives existing in the vector store.

What to Build

1. Add metadata fields to indexed documents

In the search-indexer worker, when documents are synced to Typesense, include:

{ // existing fields text: string, tenantId: string, datasetName: string, source: string, // new fields createdAt: number, // unix timestamp — enables recency scoring importanceScore: number // 0.0–1.0 — set at index time based on document type }

Importance scores at index time (heuristic, no LLM call needed):

Document typeImportance
Brand voice document1.0
Client context file0.9
Published landing page0.8
Competitor analysis0.7
Blog post (published)0.6
Web crawl page0.4
Generic knowledge doc0.5

In packages/feature-search/src/search.ts, apply the formula after retrieving Typesense results:

function weightedScore( relevance: number, // from Typesense text_match score, normalised 0–1 createdAt: number, // unix timestamp importanceScore: number, now = Date.now() ): number { const ageHours = (now - createdAt) / (1000 * 60 * 60); // exponential decay: half-life of 30 days const recency = Math.exp(-ageHours / (30 * 24)); const α = 0.5; const β = 0.3; const γ = 0.2; return α * relevance + β * recency + γ * importanceScore; }

Re-rank results by weightedScore before returning topK.

3. Add a quality threshold filter

Currently all topK results are returned regardless of score. Add a minimum threshold:

const MIN_RELEVANCE_THRESHOLD = 0.3; results = results.filter(r => r.weightedScore >= MIN_RELEVANCE_THRESHOLD);

This prevents low-quality chunks from consuming prompt token budget.

4. Track citation rate per chunk (long-term)

After a run completes, if the output text contains a substring that closely matches a retrieved chunk, mark that chunk as cited: true in a RagCitation log table. Over time, chunks with high citation rates get a boosted importance score on re-index.

This is the compound flywheel: better retrieval → better output → better citation signal → better retrieval.

Files to Change

  • apps/servers/search-indexer/src/workers/ — add createdAt and importanceScore to all 13 collection fetchers
  • packages/feature-search/src/search.ts — implement weighted scoring and threshold filter
  • packages/db/prisma/schema.prisma — add optional RagCitation model for citation tracking (phase 2)
  • Gap 9: Episodic memory (both improve what context agents receive)
  • Gap 8: Context window management (filtering low-score chunks reduces prompt size)

© 2026 Leadmetrics — Internal use only