Article
nvidia-merlinrecommendation-enginetwo-tower-modeldeep-learningmodel-servingtriton-inference-servernvtabularpersonalization
Build a Two-Stage Recommender with NVIDIA Merlin
Use NVIDIA Merlin to build a production-ready two-stage recommendation engine. You'll create a two-tower model for fast candidate retrieval and a ranking model to refine the results, then deploy it as a single service.
intermediate1 hour5 steps
The play
- Setup Environment and Prepare DataInstall the core Merlin libraries. This guide assumes you have a CUDA-enabled GPU environment. We will use NVTabular to preprocess the MovieLens-100k dataset, converting categorical features into a format suitable for model training.
- Train the Two-Tower Retrieval ModelThe first stage is a retrieval model that quickly finds hundreds of potentially relevant items for a user. We use a Two-Tower architecture, which learns separate embeddings for users and items, making candidate generation highly efficient.
- Train the DLRM Re-Ranking ModelThe second stage is a ranking model that scores the candidates from the retrieval step. We'll use a DLRM (Deep Learning Recommendation Model), which is more complex and accurate, to create the final ordered list of recommendations.
- Build the Ensemble Inference GraphUsing Merlin Systems, we chain the retrieval and ranking models into a single deployable graph. This graph defines the full two-stage logic: receive a user ID, retrieve candidates, and re-rank them for the final output.
- Deploy with Triton Inference ServerExport the ensemble workflow and serve it using NVIDIA's Triton Inference Server. Running the provided Docker command launches a server that exposes your two-stage recommender as a high-performance API endpoint.
Starter code
import os
import shutil
import merlin.models.tf as mm
import nvtabular as nvt
from merlin.datasets.synthetic import generate_data
from merlin.io import Dataset
from merlin.schema import Schema, tags
# 1. Generate Synthetic Data and Preprocess with NVTabular
DATA_DIR = './merlin_data/'
if os.path.exists(DATA_DIR):
shutil.rmtree(DATA_DIR)
# Use Merlin's synthetic data generator for a quick example
train, valid = generate_data("movielens-100k", num_rows=100_000, set_sizes=(0.8, 0.2))
user_features = ["user_id", "user_age", "user_gender", "user_occupation"] >> nvt.Categorify()
item_features = ["item_id"] >> nvt.Categorify()
workflow = nvt.Workflow(user_features + item_features)
workflow.fit(train.to_ddf())
workflow.transform(train.to_ddf()).to_parquet(output_path=os.path.join(DATA_DIR, "train"))
workflow.transform(valid.to_ddf()).to_parquet(output_path=os.path.join(DATA_DIR, "valid"))
# 2. Create Merlin Datasets
train_ds = Dataset(os.path.join(DATA_DIR, "train", "*.parquet"), part_size="500MB")
valid_ds = Dataset(os.path.join(DATA_DIR, "valid", "*.parquet"), part_size="500MB")
schema = train_ds.schema
# 3. Define and Train a Two-Tower Retrieval Model
retrieval_model = mm.TwoTowerModel(
schema,
query_tower=mm.MLPBlock([128, 64], no_activation_last_layer=True, activation='relu'),
item_tower=mm.MLPBlock([128, 64], no_activation_last_layer=True, activation='relu'),
)
retrieval_model.compile(optimizer="adam", run_eagerly=False)
print("\nTraining the Two-Tower model...\n")
retrieval_model.fit(
train_ds,
validation_data=valid_ds,
batch_size=1024,
epochs=1
)
# 4. Get Top-K Candidates for a User
print("\nFetching top 5 candidates for a user...\n")
item_embs = retrieval_model.item_embeddings(Dataset(workflow.output_schema.apply_to_all_columns(['item_id']).to_ddf()))
candidate_tower = retrieval_model.item_tower
# Get a batch of users from the validation set
user_batch = mm.sample_batch(valid_ds, batch_size=1, include_targets=False)
# Get recommendations
top_k = retrieval_model.recommend_from_user(user_batch, k=5)
print("Top 5 recommended item IDs:")
print(top_k['item_id'].numpy())
# Clean up
shutil.rmtree(DATA_DIR)