Skip to content

PaperCodex

Subscribe

Complex Reasoning

NeedleBench: Rigorously Evaluate LLM Retrieval and Reasoning in Long-Context Scenarios

NeedleBench: Rigorously Evaluate LLM Retrieval and Reasoning in Long-Context Scenarios 6409

Evaluating how well large language models (LLMs) retrieve critical facts and perform reasoning over long documents remains a major challenge…

12/19/2025Complex Reasoning, Long-context Retrieval, Synthetic Benchmarking
Search-o1: Boost Large Reasoning Models with On-Demand Knowledge Retrieval for Complex Problem Solving

Search-o1: Boost Large Reasoning Models with On-Demand Knowledge Retrieval for Complex Problem Solving 1119

Large reasoning models (LRMs)—such as OpenAI’s o1—excel at multi-step logical reasoning, especially in science, math, and code-related tasks. But they…

12/18/2025Agentic Search, Complex Reasoning, Retrieval-Augmented Generation
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex