Image segmentation has long been a cornerstone of computer vision—yet traditional approaches often behave like black boxes, especially when faced…
Vision-R1: Boost Multimodal Reasoning in Visual Math and Complex Problem Solving Without Human Annotations 710
If you’re evaluating multimodal AI systems for tasks that demand deep reasoning—such as solving visual math problems, interpreting charts, or…
Fin-R1: A 7B Financial Reasoning LLM That Outperforms Larger Models on Complex Finance Tasks 688
Fin-R1 is a purpose-built reasoning large language model (LLM) designed specifically for the financial domain. Despite having only 7 billion…
MM-Eureka: High-Accuracy Multimodal Reasoning for STEM Education and Technical QA 737
In the rapidly evolving field of multimodal AI, most models still struggle to combine visual understanding with precise, step-by-step logical…
LBM: One-Step, Multi-Task Image Translation with State-of-the-Art Speed and Simplicity 728
Image-to-image translation is a foundational capability in computer vision, enabling applications from photo editing to 3D scene understanding. Yet many…
SpatialTrackerV2: Real-Time 3D Point Tracking from Monocular Video—Fast, Accurate, and End-to-End 798
If you’ve ever tried to track 3D points in a monocular video—say, for robotics perception, AR/VR content creation, or sports…
Lumina-Image 2.0: High-Quality, Efficient Text-to-Image Generation with Unified Architecture and Strong Open-Source Support 805
Lumina-Image 2.0 is a state-of-the-art open-source text-to-image (T2I) generation framework that delivers exceptional visual fidelity and prompt adherence while maintaining…
Video-R1: Boost Video Reasoning in MLLMs with Efficient RL—Outperforming GPT-4o on Spatial Tasks 709
Video understanding has long been a bottleneck for multimodal large language models (MLLMs). While models can recognize objects or scenes…
PharMolixFM: High-Accuracy, All-Atom Molecular Modeling for Real-World Drug Discovery and Structural Biology 925
PharMolixFM is an all-atom foundation model purpose-built for molecular modeling and generation, jointly developed by PharMolix Inc. and the Institute…
ActionStudio: Unify, Train, and Deploy Large Action Models 9x Faster for Autonomous Agents 563
As autonomous AI agents become central to real-world applications—from customer service bots to robotic process automation—the demand for Large Action…