Skip to content

PaperCodex

Subscribe

Hallucination Detection

HaluEval: Detect and Benchmark LLM Hallucinations Across QA, Dialogue, and Summarization

HaluEval: Detect and Benchmark LLM Hallucinations Across QA, Dialogue, and Summarization 536

Large language models (LLMs) like ChatGPT are transforming how we interact with AI—but they often “make things up.” These fabricated,…

01/13/2026Hallucination Detection, Knowledge-grounded Dialogue, Question Answering
FacTool: Automatically Detect Factual Errors in LLM Outputs Across Code, Math, QA, and Scientific Writing

FacTool: Automatically Detect Factual Errors in LLM Outputs Across Code, Math, QA, and Scientific Writing 899

Large language models (LLMs) like ChatGPT and GPT-4 have transformed how we generate text, write code, solve math problems, and…

01/13/2026Factuality Detection, Hallucination Detection, LLM Evaluation
Bi’an: Detect RAG Hallucinations Accurately with a Bilingual Benchmark and Lightweight Judge Models

Bi’an: Detect RAG Hallucinations Accurately with a Bilingual Benchmark and Lightweight Judge Models 8343

Retrieval-Augmented Generation (RAG) has become a go-to strategy for grounding large language model (LLM) responses in real-world knowledge. By pulling…

12/19/2025Factuality Evaluation, Hallucination Detection, Retrieval-Augmented Generation
UQLM: Detect LLM Hallucinations with Uncertainty Quantification—Confidence Scoring Made Practical

UQLM: Detect LLM Hallucinations with Uncertainty Quantification—Confidence Scoring Made Practical 1079

Large Language Models (LLMs) are transforming how we build intelligent applications—from customer service bots to clinical decision support tools. Yet…

12/18/2025Hallucination Detection, LLM Reliability, Uncertainty Quantification
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex