Awesome Visual Reasoning Papers and Source Codes

Chat-UniVi: One Unified Model for Image and Video Understanding—No More Separate Systems Needed 939

In today’s AI landscape, multimodal systems that understand both images and videos are increasingly essential—but most solutions force you to…

01/13/2026Multimodal Understanding, Video Question Answering, Visual Reasoning

Seg-Zero: Interpretable, Zero-Shot Image Segmentation with Reasoning Chains and Reinforcement Learning 527

Image segmentation has long been a cornerstone of computer vision—yet traditional approaches often behave like black boxes, especially when faced…

01/09/2026Interpretable Vision Models, Visual Reasoning, Zero-shot Segmentation

DeepEyes: Enable Vision-Language Models to “Think with Images” and Solve Complex Visual Reasoning Tasks 858

Most modern Vision-Language Models (VLMs) treat images as static inputs—processed once, then reasoned about using purely text-based logic. But humans…

01/09/2026Multimodal Reinforcement Learning, vision-language modeling, Visual Reasoning