Skip to content

PaperCodex

Subscribe

Reinforcement Learning

AgentGym: Build Generalist LLM Agents That Evolve Across Real-World Environments Without Constant Human Supervision

AgentGym: Build Generalist LLM Agents That Evolve Across Real-World Environments Without Constant Human Supervision 616

Building AI agents that can handle diverse, real-world tasks—and improve over time without hand-holding—is one of the biggest challenges in…

01/13/2026Agent Generalization, LLM-based Agents, Reinforcement Learning
RL4CO: Accelerate Reinforcement Learning for Combinatorial Optimization with a Unified, Reproducible Benchmark

RL4CO: Accelerate Reinforcement Learning for Combinatorial Optimization with a Unified, Reproducible Benchmark 757

Combinatorial optimization (CO) lies at the heart of countless real-world challenges—from vehicle routing and job scheduling to chip design and…

01/13/2026Combinatorial Optimization, Operations Research, Reinforcement Learning
RLinf: Accelerate Large-Scale Reinforcement Learning for Agentic AI and Embodied Intelligence

RLinf: Accelerate Large-Scale Reinforcement Learning for Agentic AI and Embodied Intelligence 503

Reinforcement learning (RL) is rapidly becoming the engine behind next-generation agentic AI—powering everything from math-reasoning language models to vision-guided robotic…

01/09/2026Embodied Intelligence, Reasoning Agents, Reinforcement Learning
TTRL: Boost LLM Reasoning Without Labels Using Test-Time Reinforcement Learning

TTRL: Boost LLM Reasoning Without Labels Using Test-Time Reinforcement Learning 836

Imagine being able to improve a large language model’s (LLM) reasoning capabilities after deployment, using only unlabeled test data—no ground-truth…

01/05/2026Reasoning, Reinforcement Learning, Test-time Scaling
SimpleVLA-RL: Boost Robotic Task Performance with Minimal Data Using Reinforcement Learning

SimpleVLA-RL: Boost Robotic Task Performance with Minimal Data Using Reinforcement Learning 762

Building capable robotic systems that understand vision, language, and action—commonly referred to as Vision-Language-Action (VLA) models—has become a central goal…

01/05/2026Reinforcement Learning, Robotic Manipulation, Vision-Language-Action Modeling
MimicKit: Train Physics-Based Character Controllers with Motion Imitation and Reinforcement Learning

MimicKit: Train Physics-Based Character Controllers with Motion Imitation and Reinforcement Learning 1196

Imagine needing realistic, physics-compliant character movement for a game, simulation, or robotics project—but without the months of trial, error, and…

01/04/2026Character Animation, Motion Imitation, Reinforcement Learning
PRIME: Boost LLM Reasoning with Token-Level Rewards—No Step-by-Step Labels Needed

PRIME: Boost LLM Reasoning with Token-Level Rewards—No Step-by-Step Labels Needed 1783

If you’re working to improve large language models (LLMs) on hard reasoning tasks—like math problem solving or competitive programming—you’ve likely…

12/27/2025Code Generation, Mathematical Reasoning, Reinforcement Learning
ELF: Train Real-Time Strategy AI Bots 10x Faster with a Lightweight, Flexible RL Platform

ELF: Train Real-Time Strategy AI Bots 10x Faster with a Lightweight, Flexible RL Platform 2094

Reinforcement learning (RL) for real-time strategy (RTS) games has long been bottlenecked by slow simulation, rigid environment interfaces, and high…

12/22/2025Multi-Agent Training, Real-Time Strategy Game AI, Reinforcement Learning
Reasoning Gym: Train and Evaluate Reasoning Models with Infinite, Verifiable Reinforcement Learning Environments

Reasoning Gym: Train and Evaluate Reasoning Models with Infinite, Verifiable Reinforcement Learning Environments 1265

If you’re building or evaluating reasoning-capable AI systems—especially large language models (LLMs)—you’ve likely hit a wall with static benchmarks. Traditional…

12/19/2025Procedural Task Generation, Reasoning, Reinforcement Learning
Gymnasium: A Standardized, Reproducible Interface for Reinforcement Learning Environments

Gymnasium: A Standardized, Reproducible Interface for Reinforcement Learning Environments 10396

Reinforcement learning (RL) holds immense promise for solving complex decision-making problems—from robotics and game playing to resource optimization and autonomous…

12/18/2025Algorithm Benchmarking, Environment Simulation, Reinforcement Learning
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex