Skip to main content
Article
recommendation-systemhybrid-recommendertensorflow-recommenderstwo-tower-modelcollaborative-filteringmachine-learningpython

Build a Hybrid Recommendation System with TensorFlow Recommenders

Implement a two-tower model, a common pattern for Hybrid Recommendation Systems. This approach combines user and item features into separate neural networks (towers) to efficiently generate recommendations from a large catalog.

intermediate30 min4 steps
The play
  1. Prepare Your Data
    Load the MovieLens dataset and create mappings for user and item IDs. Hybrid Recommendation Systems require structured data representing user-item interactions and features. We'll use unique string IDs for users and movie titles for items.
  2. Define the Two Towers
    Create two separate neural networks: a user tower and an item tower. Each tower converts a high-dimensional feature (like a user ID or movie title) into a low-dimensional embedding vector. This is the foundation of the two-tower architecture.
  3. Build the Hybrid Recommender Model
    Combine the towers into a single model using the `tfrs.Model` class. This class handles the logic for training Hybrid Recommendation Systems. We'll use a retrieval task, which aims to find the best item candidates for a given user from the entire catalog.
  4. Train and Generate Recommendations
    Compile and train the model. Once trained, use it to generate recommendations by creating an index of all possible items (movies) and querying it with a user's embedding.
Starter code
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

# 1. Prepare Data
ratings = tfds.load('movielens/100k-ratings', split="train")
movies = tfds.load('movielens/100k-movies', split="train")

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
})
movies = movies.map(lambda x: x["movie_title"])

user_ids_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)

# 2. Define Towers
embedding_dimension = 32
user_model = tf.keras.Sequential([
    user_ids_vocabulary,
    tf.keras.layers.Embedding(user_ids_vocabulary.vocabulary_size(), embedding_dimension)
])
movie_model = tf.keras.Sequential([
    movie_titles_vocabulary,
    tf.keras.layers.Embedding(movie_titles_vocabulary.vocabulary_size(), embedding_dimension)
])

# 3. Build the Hybrid Model
class MovieLensModel(tfrs.Model):
    def __init__(self, user_model, movie_model, movies_dataset):
        super().__init__()
        self.movie_model = movie_model
        self.user_model = user_model
        self.task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies_dataset.batch(128).map(self.movie_model)
            )
        )

    def compute_loss(self, data, training=False):
        user_embeddings = self.user_model(data["user_id"])
        positive_movie_embeddings = self.movie_model(data["movie_title"])
        return self.task(user_embeddings, positive_movie_embeddings, compute_metrics=not training)

# 4. Train and Predict
model = MovieLensModel(user_model, movie_model, movies)
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))

cached_ratings = ratings.shuffle(100_000).batch(8192).cache()
model.fit(cached_ratings, epochs=3)

# Create a BruteForce index to serve recommendations
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)

# Get top 3 recommendations for user '42'
_, titles = index(tf.constant(["42"]))
print("\n---- Recommendations for user 42 ----")
for title in titles[0, :3].numpy():
    print(title.decode('utf-8'))
Build a Hybrid Recommendation System with TensorFlow Recommenders — Action Pack