Article
recommendation-systemhybrid-recommendertensorflow-recommenderstwo-tower-modelcollaborative-filteringmachine-learningpython
Build a Hybrid Recommendation System with TensorFlow Recommenders
Implement a two-tower model, a common pattern for Hybrid Recommendation Systems. This approach combines user and item features into separate neural networks (towers) to efficiently generate recommendations from a large catalog.
intermediate30 min4 steps
The play
- Prepare Your DataLoad the MovieLens dataset and create mappings for user and item IDs. Hybrid Recommendation Systems require structured data representing user-item interactions and features. We'll use unique string IDs for users and movie titles for items.
- Define the Two TowersCreate two separate neural networks: a user tower and an item tower. Each tower converts a high-dimensional feature (like a user ID or movie title) into a low-dimensional embedding vector. This is the foundation of the two-tower architecture.
- Build the Hybrid Recommender ModelCombine the towers into a single model using the `tfrs.Model` class. This class handles the logic for training Hybrid Recommendation Systems. We'll use a retrieval task, which aims to find the best item candidates for a given user from the entire catalog.
- Train and Generate RecommendationsCompile and train the model. Once trained, use it to generate recommendations by creating an index of all possible items (movies) and querying it with a user's embedding.
Starter code
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
# 1. Prepare Data
ratings = tfds.load('movielens/100k-ratings', split="train")
movies = tfds.load('movielens/100k-movies', split="train")
ratings = ratings.map(lambda x: {
"movie_title": x["movie_title"],
"user_id": x["user_id"],
})
movies = movies.map(lambda x: x["movie_title"])
user_ids_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))
movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)
# 2. Define Towers
embedding_dimension = 32
user_model = tf.keras.Sequential([
user_ids_vocabulary,
tf.keras.layers.Embedding(user_ids_vocabulary.vocabulary_size(), embedding_dimension)
])
movie_model = tf.keras.Sequential([
movie_titles_vocabulary,
tf.keras.layers.Embedding(movie_titles_vocabulary.vocabulary_size(), embedding_dimension)
])
# 3. Build the Hybrid Model
class MovieLensModel(tfrs.Model):
def __init__(self, user_model, movie_model, movies_dataset):
super().__init__()
self.movie_model = movie_model
self.user_model = user_model
self.task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=movies_dataset.batch(128).map(self.movie_model)
)
)
def compute_loss(self, data, training=False):
user_embeddings = self.user_model(data["user_id"])
positive_movie_embeddings = self.movie_model(data["movie_title"])
return self.task(user_embeddings, positive_movie_embeddings, compute_metrics=not training)
# 4. Train and Predict
model = MovieLensModel(user_model, movie_model, movies)
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
cached_ratings = ratings.shuffle(100_000).batch(8192).cache()
model.fit(cached_ratings, epochs=3)
# Create a BruteForce index to serve recommendations
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)
# Get top 3 recommendations for user '42'
_, titles = index(tf.constant(["42"]))
print("\n---- Recommendations for user 42 ----")
for title in titles[0, :3].numpy():
print(title.decode('utf-8'))