Zero-shot Transfer Learning

LanguageBind: Unify Video, Audio, Depth, Thermal & Text in One Language-Aligned Multimodal Space 833

Imagine building an AI system that understands not just images and text—but also video, audio, infrared (thermal), and depth data—all…

01/13/2026Cross-Modal Retrieval, Multimodal Representation Learning, Zero-shot Transfer Learning

ONE-PEACE: A Single Model for Vision, Audio, and Language with Zero Pretraining Dependencies 1062

In today’s AI landscape, most multimodal systems are built by stitching together specialized models—separate vision encoders, audio processors, and language…

12/26/2025Cross-Modal Retrieval, Multimodal Representation Learning, Zero-shot Transfer Learning

Facebook
YouTube
Twitter

PaperCodex