Awesome LLM Evaluation Papers and Source Codes

OpenICL: Simplify In-Context Learning for LLM Evaluation Without Retraining 583

Evaluating large language models (LLMs) on new tasks traditionally requires fine-tuning—a process that’s time-consuming, resource-intensive, and often impractical when labeled…

01/13/2026In-context Learning, LLM Evaluation, Prompt-based Inference

$FacTool: Automatically Detect Factual Errors in LLM Outputs Across Code, Math, QA, and Scientific Writing$

FacTool: Automatically Detect Factual Errors in LLM Outputs Across Code, Math, QA, and Scientific Writing 899

Large language models (LLMs) like ChatGPT and GPT-4 have transformed how we generate text, write code, solve math problems, and…

01/13/2026Factuality Detection, Hallucination Detection, LLM Evaluation