Article
neo4j-graphragknowledge-graphraggraph-databasellmlangchainpythoncypher
Build a Knowledge Graph RAG with Neo4j GraphRAG
Use Neo4j GraphRAG to build a powerful RAG system. This combines a knowledge graph's structured relationships with vector search for more accurate, context-aware LLM answers. You'll ingest data, create a graph, and query it with natural language.
intermediate1 hour5 steps
The play
- Set Up Neo4j AuraDB & EnvironmentCreate a free Neo4j AuraDB instance for a cloud-hosted graph database. Note your URI, username, and password. Then, set up a Python environment and install the required libraries to interact with Neo4j and use LangChain for the RAG pipeline.
- Connect and Ingest DataConnect to your database using the Python driver. Write a simple function to execute a Cypher query that creates sample nodes (Movies, People) and relationships (ACTED_IN, DIRECTED). This populates your knowledge graph with structured data for querying.
- Define Graph Schema for the LLMInstantiate the `Neo4jGraph` object from LangChain. This object serves as the interface to your database. It automatically inspects the database schema (node labels, relationship types, properties) to provide context to the LLM, enabling it to generate correct Cypher queries.
- Create the GraphRAG QA ChainCombine the graph, an LLM (like OpenAI's), and a LangChain chain to create the core of your Neo4j GraphRAG system. The `GraphCypherQAChain` is specifically designed to translate a natural language question into a Cypher query, execute it, and synthesize an answer from the results.
- Query Your Graph with Natural LanguageInvoke the chain with a question in plain English. The system will convert your question into a Cypher query, run it against the knowledge graph, and return a fact-based answer grounded in your structured data. This demonstrates the power of using graphs to reduce LLM hallucinations.
Starter code
import os
from neo4j import GraphDatabase
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI
# --- 1. SET CREDENTIALS ---
# Get these from your Neo4j Aura console and OpenAI account
# It's recommended to set these as environment variables
os.environ["NEO4J_URI"] = "neo4j+s://<YOUR_AURA_DB_ID>.databases.neo4j.io"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "<YOUR_PASSWORD>"
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>"
URI = os.getenv("NEO4J_URI")
AUTH = (os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD"))
# --- 2. INGEST SAMPLE DATA ---
def ingest_data(driver):
# Clear database for a clean start
driver.execute_query("MATCH (n) DETACH DELETE n")
# Create sample movie data
driver.execute_query("""
CREATE (m:Movie {title:'The Matrix', released:1999}),
(p1:Person {name:'Keanu Reeves'}),
(p2:Person {name:'Lana Wachowski'}),
(p3:Person {name:'Carrie-Anne Moss'}),
(p1)-[:ACTED_IN {role: 'Neo'}]->(m),
(p3)-[:ACTED_IN {role: 'Trinity'}]->(m),
(p2)-[:DIRECTED]->(m)
""")
print("Sample data ingested into Neo4j.")
# Connect and run ingestion
with GraphDatabase.driver(URI, auth=AUTH) as driver:
ingest_data(driver)
# --- 3. SETUP GRAPH RAG CHAIN ---
# Initialize the LangChain Neo4jGraph component
graph = Neo4jGraph()
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Create the Question-Answering chain
chain = GraphCypherQAChain.from_llm(
llm,
graph=graph,
verbose=True
)
# --- 4. ASK A QUESTION ---
print("\n--- Querying the Knowledge Graph ---")
question = "Who acted in The Matrix?"
try:
result = chain.invoke({"query": question})
print(f"\nQ: {question}")
print(f"A: {result['result']}")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your credentials in the script are correct.")