ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation

This Action Pack outlines a sequential recommendation approach using ID and Graph View Contrastive Learning with Multi-View Attention Fusion. It enhances prediction accuracy by capturing diverse user behaviors from historical interactions, leading to more robust and personalized recommendations.

advanced2-4 weeks6 steps

The play

Prepare Sequential Interaction Data
Collect and preprocess user-item interaction sequences. Each sequence represents a user's chronological history of engaged items. Standardize item and user IDs for consistent processing across different views.
Generate ID-based Item Embeddings
Initialize and train standard item ID embeddings. These embeddings capture the intrinsic characteristics of individual items based on their unique identifiers, forming the 'ID View' representation.
Construct User-Item Interaction Graph
Build a dynamic user-item interaction graph from the sequential data. Nodes represent users and items, and edges denote interactions. Use Graph Neural Networks (GNNs) or similar techniques to generate 'Graph View' item embeddings that capture relational context.
Apply Contrastive Learning
Implement a contrastive learning objective. For each target item, create positive pairs (e.g., augmented versions of the same item/context) and negative pairs (randomly sampled or hard-mined items). Train the model to maximize agreement between positive pairs and minimize agreement with negative pairs in both ID and Graph views.
Fuse Multi-View Embeddings with Attention
Integrate an attention mechanism to dynamically combine the ID-based and Graph-based item embeddings. The attention layer learns to weigh the importance of each view's representation for predicting the next item, forming a 'Multi-View Attention Fusion' output.
Train and Evaluate the Recommendation Model
Combine the contrastive learning loss with a traditional sequential recommendation loss (e.g., cross-entropy for next-item prediction). Train the entire model end-to-end and evaluate its performance using metrics like Recall@K, NDCG@K on a held-out test set.

Starter code

import torch
import torch.nn as nn
import torch.nn.functional as F

class MultiViewRecModel(nn.Module):
    def __init__(self_model, num_items, embedding_dim, graph_embedding_dim, num_heads=4):
        super(MultiViewRecModel, self_model).__init__()
        self_model.item_embedding_id = nn.Embedding(num_items, embedding_dim)
        # Placeholder for Graph Neural Network component
        # In a real implementation, this would be a GNN layer (e.g., GCN, GAT)
        self_model.graph_embedding_proj = nn.Linear(graph_embedding_dim, embedding_dim) # Project GNN output to same dim

        self_model.attention = nn.MultiheadAttention(embed_dim=embedding_dim, num_heads=num_heads, batch_first=True)
        self_model.output_layer = nn.Linear(embedding_dim, num_items)

    def forward(self_model, item_ids, graph_features):
        # ID View Embeddings
        id_embeds = self_model.item_embedding_id(item_ids)

        # Graph View Embeddings (conceptual - requires actual GNN output)
        # For this starter, we'll assume graph_features are pre-computed GNN outputs
        graph_embeds = self_model.graph_embedding_proj(graph_features)

        # Combine views for attention (e.g., stack them along a sequence dimension)
        # query, key, value all come from the combined embeddings
        # For simplicity, let's treat ID and Graph as two 'tokens' for attention
        combined_embeds = torch.stack([id_embeds, graph_embeds], dim=1) # Shape: (batch_size, 2, embedding_dim)

        attn_output, _ = self_model.attention(combined_embeds, combined_embeds, combined_embeds)
        fused_embeds = attn_output.mean(dim=1) # Average pooled attention output

        logits = self_model.output_layer(fused_embeds)
        return logits

    def contrastive_loss(self_model, anchor_embeds, positive_embeds, negative_embeds, temperature=0.07):
        # Simplified NT-Xent loss for demonstration
        pos_sim = F.cosine_similarity(anchor_embeds, positive_embeds, dim=-1)
        neg_sim = F.cosine_similarity(anchor_embeds.unsqueeze(1), negative_embeds, dim=-1).squeeze(1)

        logits = torch.cat([pos_sim.unsqueeze(1), neg_sim], dim=1) / temperature
        labels = torch.zeros(logits.shape[0], dtype=torch.long, device=anchor_embeds.device)
        return F.cross_entropy(logits, labels)

# This is a conceptual blueprint. Actual implementation requires:
# 1. A real GNN architecture for graph_features.
# 2. Data loading and batching for sequential recommendation.
# 3. Proper negative sampling strategies for contrastive learning.
# 4. Training loop with optimizers and full loss function.

Source

Paperarxiv.org