/posts/information-retrieval-rag/
The following are notes from some of my learnings studying and building retrieval systems for RAG applications.
Vector search excels at capturing semantic similarity, making it great for understanding the general meaning and context of queries. However, it has limitations, particularly when dealing with specific keyword-based queries or named entities.
For instance, when users search for names acronyms, or IDs (e.g.,
claude-3.5-sonnet
).
This is where a hybrid approach shines;
You can create a more robust and accurate retrieval system by combining:
The decision to use hybrid search depends on your use case; For technical documentation where users need to find specific API endpoints or error codes, keyword search excels at exact matches.
For a research paper database where users explore related concepts and methodologies, semantic search might better capture topical relevance.
For a knowledge base where users might search for specific terms but also need contextually related information, a hybrid approach combines precise matching with semantic understanding.
One straightforward approach to implementing hybrid search is to combine traditional keyword matching with semantic search through score fusion.
In this method, we first run both searches in parallel: a keyword search for exact matches and a vector search for semantic similarity. Each returns a scored list of results. These scores are then normalized and combined using a weighted sum:
final_score = α * keyword_score + (1 - α) * semantic_score
Where α
is a tunable parameter which prioritizes certain documents based on
factors like their relevance to the search query, their ranking in the
individual lists, or other criteria.
The result is a final list that integrates the strengths of both keyword and semantic search methods.
Another common approach to combine results is Reciprocal Rank Fusion (RRF). RRF assigns higher weights to items that appear near the top of multiple result lists.
For each document d, the RRF score is calculated as:
RRF(d) = ÎŁ(r â R) 1 / (k + r(d))
Where:
d
is a documentR
is the set of rankers (retrievers)k
is a smoothing constant that prevents disproportionate influence of
top-ranked items (typically 60)r(d)
is the rank of document d
in ranker r
For example, if a document ranks 3rd in keyword search and 9th in
semantic search with k=60
:
RRF = 1/(60 + 3) + 1/(60 + 9) = 0.0317
Documents appearing in only one list get zero contribution from the other list. The final results are sorted by descending RRF score.
This method effectively balances results that perform well across multiple search strategies while dampening the impact of outlier rankings.
Reference implementation in Typescript:
function reciprocalRankFusion(
allResults: Document[][],
options: RRFOptions = {}
): DocumentWithRanks[] {
const { k = 60, maxResults = 10, scoreThreshold = 0 } = options;
const rrfScores = new Map<string, number>();
const docDetails = new Map<
string,
Document & { ranks: Map<number, number> }
>();
// Process each result list
allResults.forEach((resultList, retrieverId) => {
resultList.forEach((doc, rank) => {
const rrf = 1 / (k + (rank + 1));
const currentScore = rrfScores.get(doc.id) || 0;
rrfScores.set(doc.id, currentScore + rrf);
// Store document details from first occurrence
if (!docDetails.has(doc.id)) {
docDetails.set(doc.id, {
...doc,
ranks: new Map(),
});
}
// Track rank in each retriever
docDetails.get(doc.id)?.ranks.set(retrieverId, rank + 1);
});
});
// Build final results
const fusedResults: DocumentWithRanks[] = Array.from(rrfScores.entries())
.filter(([_, score]) => score >= scoreThreshold)
.map(([id, score]) => {
const document = docDetails.get(id)!;
return {
...document,
rrf_score: score,
ranks: Object.fromEntries(document.ranks) as {
[retrieverId: number]: number;
},
};
})
.sort((a, b) => (b.rrf_score ?? 0) - (a.rrf_score ?? 0))
.slice(0, maxResults);
Raw user queries often lack context or contain ambiguous references. Rather than taking these queries at face value, query reformulation acts as a preprocessing step that transforms them into more complete, self-contained expressions.
The goal is to add implicit context that users typically omit, resolve ambiguous references, and expand queries with relevant terms.
For instance, a query like âHow did we do this quarter?â might be reformulated
to âWhat is the revenue for Q1 2025?â - adding both the specific metric
(revenue
) and temporal context (Q1 2025
) that was implicit in the original
query.
One way to implement this is to maintain a context window containing recent conversation history, currently visible information, and relevant system state.
An LLM can then be used to reformulate queries by identifying contextual references, resolving pronouns, and expanding abbreviated expressions. For example, âFind similar docsâ might become âFind documentation similar to the current deployment guideâ when reformulated with appropriate context.
The key is ensuring that reformulated queries are self-contained while preserving the original intent. This makes them more effective for downstream search operations, whether using keyword, semantic, or hybrid approaches.
Many search queries contain specific attributes that are better handled through structured queries. By extracting these features first, we can leverage traditional search infrastructure more effectively.
We can use LLMs to extract various features from the reformulated query. Each feature extractor can run in parallel and should focus on specific attributes:
For example, given the query âShow me failed deployments from last week for the auth serviceâ, feature extraction might yield:
{
"deployed_at": {
"range": "2024-12-28/2025-01-03",
"confidence": 0.95
},
"status": {
"value": "failed",
"confidence": 0.98
},
"service": {
"value": "auth",
"confidence": 0.99
}
}
Effective re-ranking strategies are helpful for surfacing the most relevant documents. Here are two approaches,
These strategies are helpful independently but most powerful when combined. The result should be a more robust ordering that considers both semantic relevance and metadata compatibility.
Heuristic re-ranking employs efficient algorithms that leverage readily available signals. With metadata signals through simple, fast heuristics, we can quickly improve ranking quality:
Recency Bias: For queries requiring current information (âcurrent statusâ, âongoing issuesâ), apply time-decay functions to favor newer documents.
Entity Matching: Boost documents that reference entities (people, organizations, locations) mentioned in the query. If someone searches for âbudget proposals from marketing teamâ, documents containing marketing team members get prioritized.
Category Alignment: Leverage document categories or tags when queries suggest specific content types (âtechnical documentationâ, âfinancial reportsâ).
This heuristic re-ranking is fast, but can sometimes miss subtle nuances.
Cross-encoders look at queries and documents together, unlike traditional embedding models that process them separately. This joint analysis leads to more nuanced relevance judgments â they can better understand context, catch subtle relationships, and make more accurate relevance decisions.
For model selection, a general-purpose models like ms-marco-MiniLM will be good enough in most cases. But consider fine-tuning if your use-case truly warrants it.
The most robust retrieval solutions often combine multiple approaches. The goal is to build systems that leverage both the semantic understanding of embeddings and the precision of traditional search methods.
The key is choosing the right combination of techniques based on your specific use case, and gradually add complexity only where it demonstrably improves results.