Awesome Software Engineering Evaluation Papers and Source Codes

SWE-Lancer: Benchmark Real-World Freelance Coding Tasks to Measure LLMs’ True Engineering Value 1438

Evaluating large language models (LLMs) on synthetic coding benchmarks often fails to reflect their real-world utility. Enter SWE-Lancer—a rigorously constructed…

12/22/2025Code Generation, Software Engineering Evaluation, Technical Decision-making