Skip to content

PaperCodex

Subscribe

Multimodal Reinforcement Learning

Video-R1: Boost Video Reasoning in MLLMs with Efficient RL—Outperforming GPT-4o on Spatial Tasks

Video-R1: Boost Video Reasoning in MLLMs with Efficient RL—Outperforming GPT-4o on Spatial Tasks 709

Video understanding has long been a bottleneck for multimodal large language models (MLLMs). While models can recognize objects or scenes…

01/09/2026Multimodal Reinforcement Learning, Temporal Modeling, Video Reasoning
DeepEyes: Enable Vision-Language Models to “Think with Images” and Solve Complex Visual Reasoning Tasks

DeepEyes: Enable Vision-Language Models to “Think with Images” and Solve Complex Visual Reasoning Tasks 858

Most modern Vision-Language Models (VLMs) treat images as static inputs—processed once, then reasoned about using purely text-based logic. But humans…

01/09/2026Multimodal Reinforcement Learning, vision-language modeling, Visual Reasoning
SoundMind: Boost Audio-Language Models with Reinforcement-Learned Logical Reasoning

SoundMind: Boost Audio-Language Models with Reinforcement-Learned Logical Reasoning 1101

Most large language models (LLMs) today excel at reasoning over text—but what happens when the input includes sounds? Can an…

12/19/2025Audio-language Reasoning, Logical Reasoning In AI, Multimodal Reinforcement Learning
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex