Skip to content

PaperCodex

Subscribe

Cross-Modal Retrieval

Chinese CLIP: Enable Zero-Shot Chinese Vision-Language AI Without Custom Training

Chinese CLIP: Enable Zero-Shot Chinese Vision-Language AI Without Custom Training 5695

Multimodal AI models like OpenAI’s CLIP have transformed how developers build systems that understand both images and text. But there’s…

12/27/2025Cross-Modal Retrieval, Vision-language Pretraining, Zero-shot Image Classification
ONE-PEACE: A Single Model for Vision, Audio, and Language with Zero Pretraining Dependencies

ONE-PEACE: A Single Model for Vision, Audio, and Language with Zero Pretraining Dependencies 1062

In today’s AI landscape, most multimodal systems are built by stitching together specialized models—separate vision encoders, audio processors, and language…

12/26/2025Cross-Modal Retrieval, Multimodal Representation Learning, Zero-shot Transfer Learning
MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation

MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation 3016

Evaluating image embedding models has long been a fragmented and inconsistent process. Researchers and engineers often test models on narrow,…

12/19/2025Cross-Modal Retrieval, Image Embedding Evaluation, Visual Representation Learning
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex