Implement Hybrid Search for More Robust RAG

Combine keyword-based sparse search (like BM25) with semantic dense vector search. This Hybrid Search approach improves retrieval accuracy by capturing both exact terms and contextual meaning, making your RAG systems more effective.

intermediate30 min4 steps

The play

Install Dependencies
You need a vector database client that supports sparse-dense vectors and a library for creating embeddings. We'll use the Pinecone client and Sentence-Transformers. Install them via pip.
Configure a Hybrid Index
Initialize your client and create a new index. To enable Hybrid Search, you must use a metric compatible with both sparse and dense vectors, such as 'dotproduct', and a pod type that supports it (e.g., p1, p2, or serverless).
Generate and Upsert Hybrid Vectors
For each document, generate a dense vector (embedding) for semantic meaning and a sparse vector for keyword matching. Upsert both representations together into the index using a unique ID.
Execute a Hybrid Search Query
To run a Hybrid Search, generate both dense and sparse vectors for your query. Pass them to the query method. Use the 'alpha' parameter to control the weighting between the dense (semantic) and sparse (keyword) scores. alpha=1 is pure dense, alpha=0 is pure sparse.

Starter code

import os
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

# --- 1. Configuration ---
# Get your free API key from https://app.pinecone.io/
# Set as an environment variable for security: export PINECONE_API_KEY="YOUR_KEY"
api_key = os.getenv("PINECONE_API_KEY")
if not api_key:
    raise ValueError("PINECONE_API_KEY environment variable not set!")

INDEX_NAME = "hybrid-search-starter"

# --- 2. Initialize Connections ---
pc = Pinecone(api_key=api_key)
model = SentenceTransformer('all-MiniLM-L6-v2') # Fast, decent quality model

# --- 3. Create a Hybrid Index ---
if INDEX_NAME not in pc.list_indexes().names():
    print(f"Creating new index: {INDEX_NAME}")
    pc.create_index(
        name=INDEX_NAME,
        dimension=model.get_sentence_embedding_dimension(),
        metric="dotproduct", # Required for sparse-dense
        spec=ServerlessSpec(cloud="aws", region="us-west-2")
    )
else:
    print(f"Connecting to existing index: {INDEX_NAME}")

index = pc.Index(INDEX_NAME)
print(index.describe_index_stats())

# --- 4. Prepare Data and Vectors ---
docs = [
    "The C programming language is a general-purpose, procedural computer programming language.",
    "Machine learning is a field of study in artificial intelligence concerned with the development and study of statistical algorithms.",
    "A vector database is a database that stores information as high-dimensional vectors.",
    "Pinecone is a vector database company that provides cloud-native vector search for machine learning applications."
]

# Use TF-IDF for sparse vectors
print("Fitting TF-IDF vectorizer...")
tfidf = TfidfVectorizer()
tfidf.fit(docs)

# Helper to create sparse vectors in Pinecone's format
def build_sparse_vector(text):
    tfidf_vec = tfidf.transform([text])
    indices = tfidf_vec.indices.tolist()
    values = tfidf_vec.data.tolist()
    return {'indices': indices, 'values': values}

# --- 5. Upsert Data ---
print("Generating vectors and upserting data...")
vectors_to_upsert = []
for i, doc in enumerate(docs):
    dense_vec = model.encode(doc).tolist()
    sparse_vec = build_sparse_vector(doc)
    vectors_to_upsert.append({
        'id': f'doc-{i}',
        'values': dense_vec,
        'sparse_values': sparse_vec,
        'metadata': {'text': doc}
    })

index.upsert(vectors=vectors_to_upsert)
print(f"Upserted {len(vectors_to_upsert)} documents.")

# Wait for index to be ready
import time
time.sleep(5)

# --- 6. Run Hybrid Search Queries ---
queries = {
    "semantic_query": ("AI model algorithms", 1.0), # Pure semantic search
    "keyword_query": ("C language", 0.0), # Pure keyword search
    "hybrid_query": ("vector database for ML", 0.5) # Balanced hybrid search
}

for name, (query_text, alpha) in queries.items():
    print(f"\n--- Running {name} (alpha={alpha}) ---")
    print(f"Query: {query_text}")

    # Create vectors for the query
    query_dense = model.encode(query_text).tolist()
    query_sparse = build_sparse_vector(query_text)

    # Execute search
    results = index.query(
        vector=query_dense,
        sparse_vector=query_sparse,
        top_k=2,
        include_metadata=True,
        alpha=alpha
    )

    for match in results['matches']:
        print(f"  - Score: {match['score']:.4f}, Text: {match['metadata']['text']}")

# --- 7. Cleanup ---
# print(f"\nDeleting index '{INDEX_NAME}'...")
# pc.delete_index(INDEX_NAME)
# print("Done.")