Article

recommendation-systemhybrid-recommendertensorflow-recommenderstwo-tower-modelcollaborative-filteringmachine-learningpython

Build a Hybrid Recommendation System with TensorFlow Recommenders

Implement a two-tower model, a common pattern for Hybrid Recommendation Systems. This approach combines user and item features into separate neural networks (towers) to efficiently generate recommendations from a large catalog.

intermediate30 min4 steps

The play

Prepare Your Data
Load the MovieLens dataset and create mappings for user and item IDs. Hybrid Recommendation Systems require structured data representing user-item interactions and features. We'll use unique string IDs for users and movie titles for items.
Define the Two Towers
Create two separate neural networks: a user tower and an item tower. Each tower converts a high-dimensional feature (like a user ID or movie title) into a low-dimensional embedding vector. This is the foundation of the two-tower architecture.
Build the Hybrid Recommender Model
Combine the towers into a single model using the `tfrs.Model` class. This class handles the logic for training Hybrid Recommendation Systems. We'll use a retrieval task, which aims to find the best item candidates for a given user from the entire catalog.
Train and Generate Recommendations
Compile and train the model. Once trained, use it to generate recommendations by creating an index of all possible items (movies) and querying it with a user's embedding.

Starter code

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

# 1. Prepare Data
ratings = tfds.load('movielens/100k-ratings', split="train")
movies = tfds.load('movielens/100k-movies', split="train")

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
})
movies = movies.map(lambda x: x["movie_title"])

user_ids_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)

# 2. Define Towers
embedding_dimension = 32
user_model = tf.keras.Sequential([
    user_ids_vocabulary,
    tf.keras.layers.Embedding(user_ids_vocabulary.vocabulary_size(), embedding_dimension)
])
movie_model = tf.keras.Sequential([
    movie_titles_vocabulary,
    tf.keras.layers.Embedding(movie_titles_vocabulary.vocabulary_size(), embedding_dimension)
])

# 3. Build the Hybrid Model
class MovieLensModel(tfrs.Model):
    def __init__(self, user_model, movie_model, movies_dataset):
        super().__init__()
        self.movie_model = movie_model
        self.user_model = user_model
        self.task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies_dataset.batch(128).map(self.movie_model)
            )
        )

    def compute_loss(self, data, training=False):
        user_embeddings = self.user_model(data["user_id"])
        positive_movie_embeddings = self.movie_model(data["movie_title"])
        return self.task(user_embeddings, positive_movie_embeddings, compute_metrics=not training)

# 4. Train and Predict
model = MovieLensModel(user_model, movie_model, movies)
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))

cached_ratings = ratings.shuffle(100_000).batch(8192).cache()
model.fit(cached_ratings, epochs=3)

# Create a BruteForce index to serve recommendations
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)

# Get top 3 recommendations for user '42'
_, titles = index(tf.constant(["42"]))
print("\n---- Recommendations for user 42 ----")
for title in titles[0, :3].numpy():
    print(title.decode('utf-8'))