Skip to content

PaperCodex

Subscribe

Large Language Model Alignment

REINFORCE++: A Critic-Free RLHF Algorithm for Faster, More Robust LLM Alignment

REINFORCE++: A Critic-Free RLHF Algorithm for Faster, More Robust LLM Alignment 8585

Aligning large language models (LLMs) with human preferences is essential for building safe, helpful, and reliable AI systems. Reinforcement Learning…

12/22/2025Large Language Model Alignment, Reasoning With Chain-of-Thought, Reinforcement Learning From Human Feedback (RLHF)
Verl: A Flexible, High-Performance RLHF Framework for Aligning Large Language Models at Scale

Verl: A Flexible, High-Performance RLHF Framework for Aligning Large Language Models at Scale 17406

Verl (short for Volcano Engine Reinforcement Learning) is an open-source, production-ready framework designed specifically for Reinforcement Learning from Human Feedback…

12/12/2025Large Language Model Alignment, Multi-modal Reinforcement Learning, Reinforcement Learning From Human Feedback (RLHF)
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex