PaperCodex | Page 45 of 53 | Find Awesome Papers and Source Codes

Search-o1: Boost Large Reasoning Models with On-Demand Knowledge Retrieval for Complex Problem Solving

Search-o1: Boost Large Reasoning Models with On-Demand Knowledge Retrieval for Complex Problem Solving 1119

Large reasoning models (LRMs)—such as OpenAI’s o1—excel at multi-step logical reasoning, especially in science, math, and code-related tasks. But they…

12/18/2025Agentic Search, Complex Reasoning, Retrieval-Augmented Generation

ReCamMaster: Reshoot Any Video with New Camera Movements—No 3D Assets or Multi-Camera Setup Needed

ReCamMaster: Reshoot Any Video with New Camera Movements—No 3D Assets or Multi-Camera Setup Needed 1655

Imagine being able to take a single, static video shot on your phone and instantly transform it into a cinematic…

12/18/2025Camera-controlled Video Generation, Generative Video Editing, Video Re-rendering

AgentCPM-GUI: On-Device AI Agent for Bilingual Mobile Automation with Reinforcement Fine-Tuning

AgentCPM-GUI: On-Device AI Agent for Bilingual Mobile Automation with Reinforcement Fine-Tuning 1142

AgentCPM-GUI is an open-source, on-device large language model (LLM) agent designed to understand smartphone screenshots and autonomously perform user-specified tasks…

12/18/2025GUI Agent, Mobile Automation, On-device Inference

UQLM: Detect LLM Hallucinations with Uncertainty Quantification—Confidence Scoring Made Practical

UQLM: Detect LLM Hallucinations with Uncertainty Quantification—Confidence Scoring Made Practical 1079

Large Language Models (LLMs) are transforming how we build intelligent applications—from customer service bots to clinical decision support tools. Yet…

12/18/2025Hallucination Detection, LLM Reliability, Uncertainty Quantification

Chronos: The First AI Built for Debugging—Not Code Generation

Chronos: The First AI Built for Debugging—Not Code Generation 5310

Despite massive advances in large language models (LLMs) for coding, a silent crisis persists: debugging remains largely unsolved. Top models…

12/18/2025AI-powered Software Maintenance, Autonomous Debugging, Repository-scale Code Understanding

Dolphin: Lightweight, Accurate Document Image Parsing for Real-World Mixed-Content Pages

Dolphin: Lightweight, Accurate Document Image Parsing for Real-World Mixed-Content Pages 7904

Parsing complex document images—those containing intertwined text paragraphs, tables, mathematical formulas, figures, and code—is a persistent challenge in applied AI.…

12/18/2025Document Image Parsing, Layout Analysis, Multimodal Understanding

VLM-R1: Boost Visual Reasoning and Generalization with R1-Style Reinforcement Learning for Vision-Language Models

VLM-R1: Boost Visual Reasoning and Generalization with R1-Style Reinforcement Learning for Vision-Language Models 5743

If you’re working on vision-language tasks that require precise reasoning—like identifying objects based on natural language descriptions, detecting UI defects…

12/18/2025Multimodal Reasoning, Open-Vocabulary Detection, Referring Expression Comprehension

LiteCUA: Bridge the Gap Between LLMs and Real Computers with Lightweight, Context-Aware Automation

LiteCUA: Bridge the Gap Between LLMs and Real Computers with Lightweight, Context-Aware Automation 4853

Imagine an AI agent that doesn’t just talk about using a computer—it actually uses one. That’s the promise of LiteCUA,…

12/18/2025Computer Use Agent, Contextualized Agent Environment, OS-level Automation

RSL-RL: A Lightweight, Robotics-Optimized RL Library for Fast Sim-to-Real Transfer

RSL-RL: A Lightweight, Robotics-Optimized RL Library for Fast Sim-to-Real Transfer 1956

Reinforcement learning (RL) has become a cornerstone of modern robotics research, yet many general-purpose RL libraries fall short when it…

12/18/2025Reinforcement Learning For Robotics, Robotic Control, Sim-to-real Transfer

SmolVLA: High-Performance Vision-Language-Action Robotics on a Single GPU

SmolVLA: High-Performance Vision-Language-Action Robotics on a Single GPU 20075

SmolVLA is a compact yet capable Vision-Language-Action (VLA) model designed to bring state-of-the-art robot control within reach of researchers, educators,…

12/18/2025Imitation Learning, Robotic Manipulation, Vision-Language-Action Modeling