MoToRec: Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation

MoToRec is a novel framework that addresses the item cold-start problem in recommender systems by transforming multimodal content into discrete semantic tokens using a sparsely-regularized Residual Quantized Variational Autoencoder (RQ-VAE). The approach outperforms state-of-the-art baselines on three large-scale real-world datasets by creating disentangled, interpretable representations that filter noise and prioritize cold-start items through adaptive rarity amplification.

MoToRec: Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation

MoToRec: A Novel Framework Using Discrete Semantic Tokenization to Tackle the Cold-Start Problem in AI Recommendations

Researchers have introduced a groundbreaking framework, MoToRec, that tackles the persistent item cold-start problem in AI-powered recommender systems by transforming multimodal content into discrete, interpretable semantic tokens. This novel approach, detailed in a new paper on arXiv, leverages a sparsely-regularized Residual Quantized Variational Autoencoder (RQ-VAE) to generate compositional codes, effectively disentangling noisy and entangled data to create superior representations for new items with little to no interaction history.

While Graph Neural Networks (GNNs) have become the backbone of modern recommendation engines by modeling complex user-item graphs, their performance degrades significantly under data sparsity. New items, lacking historical interaction data, are particularly challenging. Although integrating multimodal content—like text, images, and audio—offers a potential solution, existing methods often produce suboptimal item embeddings due to noise and poor disentanglement in sparse datasets.

From Continuous Vectors to Discrete Tokens: The Core Innovation

The core innovation of MoToRec is its paradigm shift from learning continuous vector embeddings to performing discrete semantic tokenization. The framework's centerpiece is a specially designed sparsely-regularized RQ-VAE. This model compresses multimodal item content into a sequence of discrete tokens from a learned codebook, inherently promoting a disentangled and interpretable latent representation. This process filters out irrelevant noise and isolates semantic concepts, creating a cleaner, more robust foundation for recommendation.

Synergistic Architectural Components for Cold-Start Superiority

The power of MoToRec is amplified by three key, synergistic components that work in concert. First, the sparsely-regularized RQ-VAE enforces a bottleneck that encourages the model to learn only the most essential, disentangled features. Second, a novel adaptive rarity amplification mechanism dynamically prioritizes learning signals from cold-start items during training, ensuring the model does not become biased toward popular items. Finally, a hierarchical multi-source graph encoder fuses these high-quality discrete item representations with traditional collaborative signals from the user-item interaction graph, creating a unified and robust profile for each user and item.

Validated Performance and Future Implications

The researchers validated MoToRec through extensive experiments on three large-scale real-world datasets. The results demonstrated its clear superiority over state-of-the-art baselines, not only in overall recommendation accuracy but—critically—in cold-start scenarios. This work provides strong empirical evidence that discrete tokenization, a technique gaining traction in generative AI, offers a scalable and effective alternative for mitigating one of the most long-standing challenges in recommender systems.

Why This Matters: Key Takeaways

  • Solves a Core Industry Problem: MoToRec directly addresses the item cold-start problem, a major hurdle for deploying effective recommender systems in dynamic environments like e-commerce and streaming platforms.
  • Novel Paradigm Shift: It pioneers the application of discrete semantic tokenization for recommendation, moving beyond continuous embeddings to create more interpretable and robust item representations.
  • Enhanced with Strategic Mechanisms: The framework is not a single model but a sophisticated system integrating sparse regularization, adaptive learning for rare items, and hierarchical graph fusion to achieve its state-of-the-art results.
  • Proven Scalability: Successful experiments on large-scale datasets indicate that this approach is viable for real-world, industrial-scale applications, offering a new path forward for AI-driven personalization.

常见问题