Despite massive advances in large language models (LLMs) for coding, a silent crisis persists: debugging remains largely unsolved. Top models…
Dolphin: Lightweight, Accurate Document Image Parsing for Real-World Mixed-Content Pages 7904
Parsing complex document images—those containing intertwined text paragraphs, tables, mathematical formulas, figures, and code—is a persistent challenge in applied AI.…
VLM-R1: Boost Visual Reasoning and Generalization with R1-Style Reinforcement Learning for Vision-Language Models 5743
If you’re working on vision-language tasks that require precise reasoning—like identifying objects based on natural language descriptions, detecting UI defects…
LiteCUA: Bridge the Gap Between LLMs and Real Computers with Lightweight, Context-Aware Automation 4853
Imagine an AI agent that doesn’t just talk about using a computer—it actually uses one. That’s the promise of LiteCUA,…
RSL-RL: A Lightweight, Robotics-Optimized RL Library for Fast Sim-to-Real Transfer 1956
Reinforcement learning (RL) has become a cornerstone of modern robotics research, yet many general-purpose RL libraries fall short when it…
SmolVLA: High-Performance Vision-Language-Action Robotics on a Single GPU 20075
SmolVLA is a compact yet capable Vision-Language-Action (VLA) model designed to bring state-of-the-art robot control within reach of researchers, educators,…
StableVideo: Text-Driven Video Editing with Frame-to-Frame Consistency 1444
Editing objects in existing videos while preserving their appearance across time has long been a challenge for diffusion-based models. While…
ElizaOS: The Web3-Friendly AI Agent Framework That Just Works 17177
In today’s fast-evolving landscape of artificial intelligence and decentralized systems, developers increasingly need tools that bridge the gap between large…
ComfyUI-R1: Automate Complex AI Art Workflows with Reasoning-Powered Generation and Debugging 3890
Building visual AI workflows in ComfyUI offers immense creative flexibility—but mastering its node-based interface demands significant expertise. Users often struggle…
Paper2Video: Automatically Turn Scientific Papers into Ready-to-Use Presentation Videos 1860
Creating high-quality academic presentation videos is notoriously time-consuming. Researchers often spend hours designing slides, recording voiceovers, editing footage, and syncing…