Skip to content

PaperCodex

Subscribe

Mathematical Reasoning

PRIME: Boost LLM Reasoning with Token-Level Rewards—No Step-by-Step Labels Needed

PRIME: Boost LLM Reasoning with Token-Level Rewards—No Step-by-Step Labels Needed 1783

If you’re working to improve large language models (LLMs) on hard reasoning tasks—like math problem solving or competitive programming—you’ve likely…

12/27/2025Code Generation, Mathematical Reasoning, Reinforcement Learning
DeepSeek-V3: A High-Performance, Cost-Efficient MoE Language Model That Delivers Closed-Source Power with Open-Source Flexibility

DeepSeek-V3: A High-Performance, Cost-Efficient MoE Language Model That Delivers Closed-Source Power with Open-Source Flexibility 100738

For technical decision-makers evaluating large language models (LLMs) for real-world applications, balancing raw capability, inference cost, training efficiency, and deployment…

12/26/2025Code Generation, Mathematical Reasoning, Multilingual Language Modeling
S1: Boost Reasoning Performance with Just 1,000 Examples and Smart Test-Time Scaling

S1: Boost Reasoning Performance with Just 1,000 Examples and Smart Test-Time Scaling 6613

In the rapidly evolving landscape of large language models (LLMs), achieving strong reasoning capabilities often comes at the cost of…

12/18/2025Mathematical Reasoning, Structured Reasoning, Test-time Scaling
MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini

MiMo: High-Performance Reasoning in a 7B Model—Outperforming 32B Models and Matching o1-mini 1637

MiMo is a 7-billion-parameter language model purpose-built for reasoning-intensive tasks—spanning mathematics, code generation, and STEM problem solving—without the computational overhead…

12/17/2025Code Generation, Mathematical Reasoning, Reinforcement Learning For Reasoning
rStar2-Agent: A 14B Math Reasoning Model That Outsmarts 671B Models with Smarter, Tool-Aware Agentic Reasoning

rStar2-Agent: A 14B Math Reasoning Model That Outsmarts 671B Models with Smarter, Tool-Aware Agentic Reasoning 1356

In the rapidly evolving landscape of large language models (LLMs), bigger isn’t always better—smarter is. Enter rStar2-Agent, a 14-billion-parameter reasoning…

12/17/2025Agentic Tool Use, Mathematical Reasoning, Reinforcement Learning For Reasoning
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex