Most large language models (LLMs) today excel at reasoning over text—but what happens when the input includes sounds? Can an…
AReaL: Accelerate Language Reasoning Training with Fully Asynchronous Reinforcement Learning 3143
If you’re building or fine-tuning large language models (LLMs) for reasoning—whether in math, coding, search, or agentic workflows—you’ve likely hit…
VSA: Accelerate Video Diffusion Models by 2.5× with Trainable Sparse Attention—No Quality Tradeoff 2780
Video generation using diffusion transformers (DiTs) is rapidly advancing—but at a steep computational cost. Full 3D attention in these models…
RAGEN: Train LLM Agents That Reason and Act Across Multi-Turn, Stochastic Environments 2438
Building autonomous agents that can reason, act, and adapt over multiple interaction steps remains one of the toughest challenges in…
BrowseComp: A Focused Benchmark for Evaluating Web-Browsing Capabilities in AI Agents 4214
Evaluating whether an AI agent can truly browse the web—navigating across pages, persisting through dead ends, and extracting entangled facts—is…
Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training 1809
Perception Encoder (PE) redefines what’s possible with a single vision encoder. Unlike legacy approaches that demand different pretraining strategies for…
Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01 2943
Creating professional academic posters from dense, multi-page scientific papers is a universal pain point for researchers, PhD students, and lab…
RoboTwin 2.0: Scalable Simulation Platform for Robust Bimanual Robotic Manipulation with Strong Domain Randomization 1726
Robotic manipulation—especially with two arms working in coordination—is essential for complex real-world tasks like assembling electronics, handling kitchenware, or performing…
MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation 3016
Evaluating image embedding models has long been a fragmented and inconsistent process. Researchers and engineers often test models on narrow,…
AIBrix: Scalable, Cost-Effective LLM Inference Infrastructure for Enterprise-Grade GenAI Deployment 4460
Deploying large language models (LLMs) at scale in production environments remains a significant challenge for engineering teams. High inference costs,…