Deploy a Production RAG Pipeline with a Setup Script

Use the RAG Pipeline Setup script to automate deploying a complete retrieval-augmented generation system. This guide walks you through configuring, provisioning infrastructure, ingesting documents, and testing the final retrieval endpoint.

intermediate30 min4 steps

The play

Configure Your Environment
Before running the RAG Pipeline Setup script, create a `.env` file to store your API keys. The script needs these to connect to your LLM provider (e.g., OpenAI) and your vector database (e.g., Pinecone).
Provision the Infrastructure
Execute the main RAG Pipeline Setup script. This command reads your configuration, provisions a new vector database index, and prepares the environment for document ingestion. Monitor the output for any errors.
Ingest and Embed Documents
Run the ingestion component of the RAG Pipeline Setup script, pointing it to your local document directory. The script will chunk the files, generate embeddings, and upload them to the newly provisioned vector store.
Query the Retrieval Endpoint
The script deploys a basic API endpoint for retrieval. Use a client script or a tool like cURL to send a query and confirm that the RAG pipeline returns relevant document chunks based on semantic similarity.

Starter code

import os
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader

# --- 1. Setup Environment (replace with your key) ---
# Make sure to `pip install langchain-openai langchain faiss-cpu`
# In a real scenario, use environment variables.
os.environ["OPENAI_API_KEY"] = "sk-YOUR_API_KEY_HERE"

# --- 2. Create a dummy document ---
with open("state_of_the_union.txt", "w") as f:
    f.write("The President said the economy is strong. He also mentioned infrastructure projects.")

# --- 3. Load and Chunk Document ---
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

# --- 4. Embed and Store in Vector DB ---
print("Creating embeddings and storing in FAISS...")
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)
retriever = db.as_retriever()

# --- 5. Setup QA Chain ---
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0),
    chain_type="stuff",
    retriever=retriever
)

# --- 6. Query the RAG system ---
query = "What did the president say about infrastructure?"
print(f"\nQuery: {query}")
result = qa_chain.run(query)
print(f"Answer: {result}")

# Cleanup the dummy file
os.remove("state_of_the_union.txt")