Awesome LLM Safety Evaluation Papers and Source Codes

HarmBench: A Standardized Framework to Evaluate LLM Safety Against Malicious Prompts 752

Large language models (LLMs) are increasingly deployed in high-stakes applications—from customer support chatbots to enterprise decision aids—but they remain vulnerable…

01/13/2026Automated Red Teaming, LLM Safety Evaluation, Robust Refusal Testing

Safety-Prompts: Benchmark and Improve Chinese LLM Safety with 100K Realistic Test Cases 1111

As large language models (LLMs) become increasingly embedded in real-world applications—especially in Chinese-speaking regions—ensuring their safety has never been more…

12/26/2025Adversarial Prompt Testing, Chinese-language Model Alignment, LLM Safety Evaluation