PaperCodex

SoundMind: Boost Audio-Language Models with Reinforcement-Learned Logical Reasoning 1101

Most large language models (LLMs) today excel at reasoning over text—but what happens when the input includes sounds? Can an…

12/19/2025Audio-language Reasoning, Logical Reasoning In AI, Multimodal Reinforcement Learning

AReaL: Accelerate Language Reasoning Training with Fully Asynchronous Reinforcement Learning 3143

If you’re building or fine-tuning large language models (LLMs) for reasoning—whether in math, coding, search, or agentic workflows—you’ve likely hit…

12/19/2025Agentic AI Training, Asynchronous RL, Reinforcement Learning For Reasoning

VSA: Accelerate Video Diffusion Models by 2.5× with Trainable Sparse Attention—No Quality Tradeoff 2780

Video generation using diffusion transformers (DiTs) is rapidly advancing—but at a steep computational cost. Full 3D attention in these models…

12/19/2025Diffusion Models, Sparse Attention, Video Generation

RAGEN: Train LLM Agents That Reason and Act Across Multi-Turn, Stochastic Environments 2438

Building autonomous agents that can reason, act, and adapt over multiple interaction steps remains one of the toughest challenges in…

12/19/2025LLM Agent Training, Multi-turn Reinforcement Learning, Trajectory-level Policy Optimization

BrowseComp: A Focused Benchmark for Evaluating Web-Browsing Capabilities in AI Agents 4214

Evaluating whether an AI agent can truly browse the web—navigating across pages, persisting through dead ends, and extracting entangled facts—is…

12/19/2025Information Retrieval, Tool-augmented Reasoning, Web Browsing Agents

Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training 1809

Perception Encoder (PE) redefines what’s possible with a single vision encoder. Unlike legacy approaches that demand different pretraining strategies for…

12/19/2025Dense Visual Prediction, Multimodal Visual Question Answering, Zero-shot Image Classification

Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01 2943

Creating professional academic posters from dense, multi-page scientific papers is a universal pain point for researchers, PhD students, and lab…

12/19/2025Automated Academic Communication, Multimodal Document Understanding, Scientific Poster Generation

RoboTwin 2.0: Scalable Simulation Platform for Robust Bimanual Robotic Manipulation with Strong Domain Randomization 1726

Robotic manipulation—especially with two arms working in coordination—is essential for complex real-world tasks like assembling electronics, handling kitchenware, or performing…

12/19/2025Bimanual Robotic Manipulation, Sim-to-real Transfer, Vision-language-action Learning

AIBrix: Scalable, Cost-Effective LLM Inference Infrastructure for Enterprise-Grade GenAI Deployment 4460

Deploying large language models (LLMs) at scale in production environments remains a significant challenge for engineering teams. High inference costs,…

12/19/2025GenAI Deployment, LLM Inference, Model Serving