Skip to content

PaperCodex

Subscribe

Visual Reasoning

Chat-UniVi: One Unified Model for Image and Video Understanding—No More Separate Systems Needed

Chat-UniVi: One Unified Model for Image and Video Understanding—No More Separate Systems Needed 939

In today’s AI landscape, multimodal systems that understand both images and videos are increasingly essential—but most solutions force you to…

01/13/2026Multimodal Understanding, Video Question Answering, Visual Reasoning
Seg-Zero: Interpretable, Zero-Shot Image Segmentation with Reasoning Chains and Reinforcement Learning

Seg-Zero: Interpretable, Zero-Shot Image Segmentation with Reasoning Chains and Reinforcement Learning 527

Image segmentation has long been a cornerstone of computer vision—yet traditional approaches often behave like black boxes, especially when faced…

01/09/2026Interpretable Vision Models, Visual Reasoning, Zero-shot Segmentation
DeepEyes: Enable Vision-Language Models to “Think with Images” and Solve Complex Visual Reasoning Tasks

DeepEyes: Enable Vision-Language Models to “Think with Images” and Solve Complex Visual Reasoning Tasks 858

Most modern Vision-Language Models (VLMs) treat images as static inputs—processed once, then reasoned about using purely text-based logic. But humans…

01/09/2026Multimodal Reinforcement Learning, vision-language modeling, Visual Reasoning
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex