Skip to content

PaperCodex

Subscribe
Seg-Zero: Interpretable, Zero-Shot Image Segmentation with Reasoning Chains and Reinforcement Learning

Seg-Zero: Interpretable, Zero-Shot Image Segmentation with Reasoning Chains and Reinforcement Learning 527

Image segmentation has long been a cornerstone of computer vision—yet traditional approaches often behave like black boxes, especially when faced…

01/09/2026Interpretable Vision Models, Visual Reasoning, Zero-shot Segmentation
Vision-R1: Boost Multimodal Reasoning in Visual Math and Complex Problem Solving Without Human Annotations

Vision-R1: Boost Multimodal Reasoning in Visual Math and Complex Problem Solving Without Human Annotations 710

If you’re evaluating multimodal AI systems for tasks that demand deep reasoning—such as solving visual math problems, interpreting charts, or…

01/09/2026Interleaving Reasoning, Multimodal Reasoning, Visual Math Problem Solving
Fin-R1: A 7B Financial Reasoning LLM That Outperforms Larger Models on Complex Finance Tasks

Fin-R1: A 7B Financial Reasoning LLM That Outperforms Larger Models on Complex Finance Tasks 688

Fin-R1 is a purpose-built reasoning large language model (LLM) designed specifically for the financial domain. Despite having only 7 billion…

01/09/2026Financial Reasoning, Quantitative Finance, Regulatory Compliance
MM-Eureka: High-Accuracy Multimodal Reasoning for STEM Education and Technical QA

MM-Eureka: High-Accuracy Multimodal Reasoning for STEM Education and Technical QA 737

In the rapidly evolving field of multimodal AI, most models still struggle to combine visual understanding with precise, step-by-step logical…

01/09/2026Multimodal Reasoning, Rule-based Reinforcement Learning, STEM Question Answering
LBM: One-Step, Multi-Task Image Translation with State-of-the-Art Speed and Simplicity

LBM: One-Step, Multi-Task Image Translation with State-of-the-Art Speed and Simplicity 728

Image-to-image translation is a foundational capability in computer vision, enabling applications from photo editing to 3D scene understanding. Yet many…

01/09/2026Depth Estimation, Image-to-image Translation, Object Relighting
SpatialTrackerV2: Real-Time 3D Point Tracking from Monocular Video—Fast, Accurate, and End-to-End

SpatialTrackerV2: Real-Time 3D Point Tracking from Monocular Video—Fast, Accurate, and End-to-End 798

If you’ve ever tried to track 3D points in a monocular video—say, for robotics perception, AR/VR content creation, or sports…

01/09/20263D Point Tracking, Dynamic Scene Reconstruction, Monocular Depth Estimation
Lumina-Image 2.0: High-Quality, Efficient Text-to-Image Generation with Unified Architecture and Strong Open-Source Support

Lumina-Image 2.0: High-Quality, Efficient Text-to-Image Generation with Unified Architecture and Strong Open-Source Support 805

Lumina-Image 2.0 is a state-of-the-art open-source text-to-image (T2I) generation framework that delivers exceptional visual fidelity and prompt adherence while maintaining…

01/09/2026Controllable Image Synthesis, Multimodal Generative Modeling, Text-to-Image Generation
Video-R1: Boost Video Reasoning in MLLMs with Efficient RL—Outperforming GPT-4o on Spatial Tasks

Video-R1: Boost Video Reasoning in MLLMs with Efficient RL—Outperforming GPT-4o on Spatial Tasks 709

Video understanding has long been a bottleneck for multimodal large language models (MLLMs). While models can recognize objects or scenes…

01/09/2026Multimodal Reinforcement Learning, Temporal Modeling, Video Reasoning
PharMolixFM: High-Accuracy, All-Atom Molecular Modeling for Real-World Drug Discovery and Structural Biology

PharMolixFM: High-Accuracy, All-Atom Molecular Modeling for Real-World Drug Discovery and Structural Biology 925

PharMolixFM is an all-atom foundation model purpose-built for molecular modeling and generation, jointly developed by PharMolix Inc. and the Institute…

01/09/2026Molecular Conformation Generation, Molecular Docking, Structure-based Drug Design
ActionStudio: Unify, Train, and Deploy Large Action Models 9x Faster for Autonomous Agents

ActionStudio: Unify, Train, and Deploy Large Action Models 9x Faster for Autonomous Agents 563

As autonomous AI agents become central to real-world applications—from customer service bots to robotic process automation—the demand for Large Action…

01/09/2026Autonomous Agents, Function Calling, Multi-turn Reasoning

Posts pagination

Previous 1 … 3 4 5 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex