Awesome Robust Refusal Testing Papers and Source Codes

HarmBench: A Standardized Framework to Evaluate LLM Safety Against Malicious Prompts 752

Large language models (LLMs) are increasingly deployed in high-stakes applications—from customer support chatbots to enterprise decision aids—but they remain vulnerable…

01/13/2026Automated Red Teaming, LLM Safety Evaluation, Robust Refusal Testing