Awesome Multimodal Representation Learning Papers and Source Codes

ULIP-2: Scalable Multimodal 3D Understanding Without Manual Annotations 547

Imagine building a system that can understand 3D objects as intuitively as humans do—recognizing a chair from its point cloud,…

01/13/20263D Classification, Multimodal Representation Learning, Zero-shot Learning

FlowTok: Unified Text-to-Image and Image-to-Text Generation with Compact 1D Tokens 1082

FlowTok reimagines cross-modal generation by collapsing the traditionally complex boundary between text and images into a streamlined, efficient process. Unlike…

12/19/2025Image-to-text Generation, Multimodal Representation Learning, Text-to-Image Generation