Article

nvidia-merlinrecommendation-enginetwo-tower-modeldeep-learningmodel-servingtriton-inference-servernvtabularpersonalization

Build a Two-Stage Recommender with NVIDIA Merlin

Use NVIDIA Merlin to build a production-ready two-stage recommendation engine. You'll create a two-tower model for fast candidate retrieval and a ranking model to refine the results, then deploy it as a single service.

intermediate1 hour5 steps

The play

Setup Environment and Prepare Data
Install the core Merlin libraries. This guide assumes you have a CUDA-enabled GPU environment. We will use NVTabular to preprocess the MovieLens-100k dataset, converting categorical features into a format suitable for model training.
Train the Two-Tower Retrieval Model
The first stage is a retrieval model that quickly finds hundreds of potentially relevant items for a user. We use a Two-Tower architecture, which learns separate embeddings for users and items, making candidate generation highly efficient.
Train the DLRM Re-Ranking Model
The second stage is a ranking model that scores the candidates from the retrieval step. We'll use a DLRM (Deep Learning Recommendation Model), which is more complex and accurate, to create the final ordered list of recommendations.
Build the Ensemble Inference Graph
Using Merlin Systems, we chain the retrieval and ranking models into a single deployable graph. This graph defines the full two-stage logic: receive a user ID, retrieve candidates, and re-rank them for the final output.
Deploy with Triton Inference Server
Export the ensemble workflow and serve it using NVIDIA's Triton Inference Server. Running the provided Docker command launches a server that exposes your two-stage recommender as a high-performance API endpoint.

Starter code

import os
import shutil
import merlin.models.tf as mm
import nvtabular as nvt
from merlin.datasets.synthetic import generate_data
from merlin.io import Dataset
from merlin.schema import Schema, tags

# 1. Generate Synthetic Data and Preprocess with NVTabular
DATA_DIR = './merlin_data/'
if os.path.exists(DATA_DIR):
    shutil.rmtree(DATA_DIR)

# Use Merlin's synthetic data generator for a quick example
train, valid = generate_data("movielens-100k", num_rows=100_000, set_sizes=(0.8, 0.2))

user_features = ["user_id", "user_age", "user_gender", "user_occupation"] >> nvt.Categorify()
item_features = ["item_id"] >> nvt.Categorify()

workflow = nvt.Workflow(user_features + item_features)

workflow.fit(train.to_ddf())
workflow.transform(train.to_ddf()).to_parquet(output_path=os.path.join(DATA_DIR, "train"))
workflow.transform(valid.to_ddf()).to_parquet(output_path=os.path.join(DATA_DIR, "valid"))

# 2. Create Merlin Datasets
train_ds = Dataset(os.path.join(DATA_DIR, "train", "*.parquet"), part_size="500MB")
valid_ds = Dataset(os.path.join(DATA_DIR, "valid", "*.parquet"), part_size="500MB")
schema = train_ds.schema

# 3. Define and Train a Two-Tower Retrieval Model
retrieval_model = mm.TwoTowerModel(
    schema,
    query_tower=mm.MLPBlock([128, 64], no_activation_last_layer=True, activation='relu'),
    item_tower=mm.MLPBlock([128, 64], no_activation_last_layer=True, activation='relu'),
)

retrieval_model.compile(optimizer="adam", run_eagerly=False)
print("\nTraining the Two-Tower model...\n")
retrieval_model.fit(
    train_ds,
    validation_data=valid_ds,
    batch_size=1024,
    epochs=1
)

# 4. Get Top-K Candidates for a User
print("\nFetching top 5 candidates for a user...\n")
item_embs = retrieval_model.item_embeddings(Dataset(workflow.output_schema.apply_to_all_columns(['item_id']).to_ddf()))
candidate_tower = retrieval_model.item_tower

# Get a batch of users from the validation set
user_batch = mm.sample_batch(valid_ds, batch_size=1, include_targets=False)

# Get recommendations
top_k = retrieval_model.recommend_from_user(user_batch, k=5)

print("Top 5 recommended item IDs:")
print(top_k['item_id'].numpy())

# Clean up
shutil.rmtree(DATA_DIR)