Skip to content

PaperCodex

Subscribe
SoundMind: Boost Audio-Language Models with Reinforcement-Learned Logical Reasoning

SoundMind: Boost Audio-Language Models with Reinforcement-Learned Logical Reasoning 1101

Most large language models (LLMs) today excel at reasoning over text—but what happens when the input includes sounds? Can an…

12/19/2025Audio-language Reasoning, Logical Reasoning In AI, Multimodal Reinforcement Learning
AReaL: Accelerate Language Reasoning Training with Fully Asynchronous Reinforcement Learning

AReaL: Accelerate Language Reasoning Training with Fully Asynchronous Reinforcement Learning 3143

If you’re building or fine-tuning large language models (LLMs) for reasoning—whether in math, coding, search, or agentic workflows—you’ve likely hit…

12/19/2025Agentic AI Training, Asynchronous RL, Reinforcement Learning For Reasoning
VSA: Accelerate Video Diffusion Models by 2.5× with Trainable Sparse Attention—No Quality Tradeoff

VSA: Accelerate Video Diffusion Models by 2.5× with Trainable Sparse Attention—No Quality Tradeoff 2780

Video generation using diffusion transformers (DiTs) is rapidly advancing—but at a steep computational cost. Full 3D attention in these models…

12/19/2025Diffusion Models, Sparse Attention, Video Generation
RAGEN: Train LLM Agents That Reason and Act Across Multi-Turn, Stochastic Environments

RAGEN: Train LLM Agents That Reason and Act Across Multi-Turn, Stochastic Environments 2438

Building autonomous agents that can reason, act, and adapt over multiple interaction steps remains one of the toughest challenges in…

12/19/2025LLM Agent Training, Multi-turn Reinforcement Learning, Trajectory-level Policy Optimization
BrowseComp: A Focused Benchmark for Evaluating Web-Browsing Capabilities in AI Agents

BrowseComp: A Focused Benchmark for Evaluating Web-Browsing Capabilities in AI Agents 4214

Evaluating whether an AI agent can truly browse the web—navigating across pages, persisting through dead ends, and extracting entangled facts—is…

12/19/2025Information Retrieval, Tool-augmented Reasoning, Web Browsing Agents
Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training

Perception Encoder: One Vision Model to Rule Image, Video, and Language Tasks – Without Task-Specific Training 1809

Perception Encoder (PE) redefines what’s possible with a single vision encoder. Unlike legacy approaches that demand different pretraining strategies for…

12/19/2025Dense Visual Prediction, Multimodal Visual Question Answering, Zero-shot Image Classification
Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01

Paper2Poster: Automate Scientific Poster Creation from PDFs—Editable, Accurate, and Under $0.01 2943

Creating professional academic posters from dense, multi-page scientific papers is a universal pain point for researchers, PhD students, and lab…

12/19/2025Automated Academic Communication, Multimodal Document Understanding, Scientific Poster Generation
RoboTwin 2.0: Scalable Simulation Platform for Robust Bimanual Robotic Manipulation with Strong Domain Randomization

RoboTwin 2.0: Scalable Simulation Platform for Robust Bimanual Robotic Manipulation with Strong Domain Randomization 1726

Robotic manipulation—especially with two arms working in coordination—is essential for complex real-world tasks like assembling electronics, handling kitchenware, or performing…

12/19/2025Bimanual Robotic Manipulation, Sim-to-real Transfer, Vision-language-action Learning
MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation

MIEB: Benchmark 130 Image & Image-Text Tasks Across 38 Languages for Reliable Model Evaluation 3016

Evaluating image embedding models has long been a fragmented and inconsistent process. Researchers and engineers often test models on narrow,…

12/19/2025Cross-Modal Retrieval, Image Embedding Evaluation, Visual Representation Learning
AIBrix: Scalable, Cost-Effective LLM Inference Infrastructure for Enterprise-Grade GenAI Deployment

AIBrix: Scalable, Cost-Effective LLM Inference Infrastructure for Enterprise-Grade GenAI Deployment 4460

Deploying large language models (LLMs) at scale in production environments remains a significant challenge for engineering teams. High inference costs,…

12/19/2025GenAI Deployment, LLM Inference, Model Serving

Posts pagination

Previous 1 … 39 40 41 … 53 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex