Gap 2: RAG Retrieval is Relevance-Only
Problem
The RAG pipeline scores retrieved chunks purely by cosine similarity (relevance). There is no weighting for how recent the source is or how important it is relative to other content in the knowledge base.
From the Generative Agents paper (Park et al., 2023), a three-factor retrieval score produces significantly better context selection:
score = α·relevance + β·recency + γ·importanceCurrent behaviour
A stale competitor analysis from six months ago scores identically to one written last week. A core brand positioning document scores the same as an incidental blog post. A RAG chunk that was previously cited in a high-scoring output carries no advantage over one that has never been used.
The result is that prompts can be populated with outdated or low-value context despite better alternatives existing in the vector store.
What to Build
1. Add metadata fields to indexed documents
In the search-indexer worker, when documents are synced to Typesense, include:
{
// existing fields
text: string,
tenantId: string,
datasetName: string,
source: string,
// new fields
createdAt: number, // unix timestamp — enables recency scoring
importanceScore: number // 0.0–1.0 — set at index time based on document type
}Importance scores at index time (heuristic, no LLM call needed):
| Document type | Importance |
|---|---|
| Brand voice document | 1.0 |
| Client context file | 0.9 |
| Published landing page | 0.8 |
| Competitor analysis | 0.7 |
| Blog post (published) | 0.6 |
| Web crawl page | 0.4 |
| Generic knowledge doc | 0.5 |
2. Implement weighted scoring in search()
In packages/feature-search/src/search.ts, apply the formula after retrieving Typesense results:
function weightedScore(
relevance: number, // from Typesense text_match score, normalised 0–1
createdAt: number, // unix timestamp
importanceScore: number,
now = Date.now()
): number {
const ageHours = (now - createdAt) / (1000 * 60 * 60);
// exponential decay: half-life of 30 days
const recency = Math.exp(-ageHours / (30 * 24));
const α = 0.5;
const β = 0.3;
const γ = 0.2;
return α * relevance + β * recency + γ * importanceScore;
}Re-rank results by weightedScore before returning topK.
3. Add a quality threshold filter
Currently all topK results are returned regardless of score. Add a minimum threshold:
const MIN_RELEVANCE_THRESHOLD = 0.3;
results = results.filter(r => r.weightedScore >= MIN_RELEVANCE_THRESHOLD);This prevents low-quality chunks from consuming prompt token budget.
4. Track citation rate per chunk (long-term)
After a run completes, if the output text contains a substring that closely matches a retrieved chunk, mark that chunk as cited: true in a RagCitation log table. Over time, chunks with high citation rates get a boosted importance score on re-index.
This is the compound flywheel: better retrieval → better output → better citation signal → better retrieval.
Files to Change
apps/servers/search-indexer/src/workers/— addcreatedAtandimportanceScoreto all 13 collection fetcherspackages/feature-search/src/search.ts— implement weighted scoring and threshold filterpackages/db/prisma/schema.prisma— add optionalRagCitationmodel for citation tracking (phase 2)
Related
- Gap 9: Episodic memory (both improve what context agents receive)
- Gap 8: Context window management (filtering low-score chunks reduces prompt size)