Skip to content

PaperCodex

Subscribe

Zero-shot Image Classification

Chinese CLIP: Enable Zero-Shot Chinese Vision-Language AI Without Custom Training

Chinese CLIP: Enable Zero-Shot Chinese Vision-Language AI Without Custom Training 5695

Multimodal AI models like OpenAI’s CLIP have transformed how developers build systems that understand both images and text. But there’s…

12/27/2025Cross-Modal Retrieval, Vision-language Pretraining, Zero-shot Image Classification
MetaCLIP: Superior Vision-Language Models Through Transparent, High-Quality Data Curation

MetaCLIP: Superior Vision-Language Models Through Transparent, High-Quality Data Curation 1692

If you’ve worked with OpenAI’s CLIP, you know its power—but also its opacity. CLIP revolutionized zero-shot vision-language understanding, yet it…

12/27/2025Contrastive Learning, Multilingual Vision-language Modeling, Zero-shot Image Classification
Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training

Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training 1809

Perception Encoder (PE) redefines what’s possible with a single vision encoder. Unlike legacy approaches that demand different pretraining strategies for…

12/19/2025Dense Visual Prediction, Multimodal Visual Question Answering, Zero-shot Image Classification
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex