Skip to content

PaperCodex

Subscribe

LLM Safety Evaluation

HarmBench: A Standardized Framework to Evaluate LLM Safety Against Malicious Prompts

HarmBench: A Standardized Framework to Evaluate LLM Safety Against Malicious Prompts 752

Large language models (LLMs) are increasingly deployed in high-stakes applications—from customer support chatbots to enterprise decision aids—but they remain vulnerable…

01/13/2026Automated Red Teaming, LLM Safety Evaluation, Robust Refusal Testing
Safety-Prompts: Benchmark and Improve Chinese LLM Safety with 100K Realistic Test Cases

Safety-Prompts: Benchmark and Improve Chinese LLM Safety with 100K Realistic Test Cases 1111

As large language models (LLMs) become increasingly embedded in real-world applications—especially in Chinese-speaking regions—ensuring their safety has never been more…

12/26/2025Adversarial Prompt Testing, Chinese-language Model Alignment, LLM Safety Evaluation
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex