Large reasoning models (LRMs)—such as OpenAI’s o1—excel at multi-step logical reasoning, especially in science, math, and code-related tasks. But they…
ReCamMaster: Reshoot Any Video with New Camera Movements—No 3D Assets or Multi-Camera Setup Needed 1655
Imagine being able to take a single, static video shot on your phone and instantly transform it into a cinematic…
AgentCPM-GUI: On-Device AI Agent for Bilingual Mobile Automation with Reinforcement Fine-Tuning 1142
AgentCPM-GUI is an open-source, on-device large language model (LLM) agent designed to understand smartphone screenshots and autonomously perform user-specified tasks…
UQLM: Detect LLM Hallucinations with Uncertainty Quantification—Confidence Scoring Made Practical 1079
Large Language Models (LLMs) are transforming how we build intelligent applications—from customer service bots to clinical decision support tools. Yet…
Chronos: The First AI Built for Debugging—Not Code Generation 5310
Despite massive advances in large language models (LLMs) for coding, a silent crisis persists: debugging remains largely unsolved. Top models…
Dolphin: Lightweight, Accurate Document Image Parsing for Real-World Mixed-Content Pages 7904
Parsing complex document images—those containing intertwined text paragraphs, tables, mathematical formulas, figures, and code—is a persistent challenge in applied AI.…
VLM-R1: Boost Visual Reasoning and Generalization with R1-Style Reinforcement Learning for Vision-Language Models 5743
If you’re working on vision-language tasks that require precise reasoning—like identifying objects based on natural language descriptions, detecting UI defects…
LiteCUA: Bridge the Gap Between LLMs and Real Computers with Lightweight, Context-Aware Automation 4853
Imagine an AI agent that doesn’t just talk about using a computer—it actually uses one. That’s the promise of LiteCUA,…
RSL-RL: A Lightweight, Robotics-Optimized RL Library for Fast Sim-to-Real Transfer 1956
Reinforcement learning (RL) has become a cornerstone of modern robotics research, yet many general-purpose RL libraries fall short when it…
SmolVLA: High-Performance Vision-Language-Action Robotics on a Single GPU 20075
SmolVLA is a compact yet capable Vision-Language-Action (VLA) model designed to bring state-of-the-art robot control within reach of researchers, educators,…