If you’re building or maintaining AI-powered coding assistants, you’ve likely faced a frustrating trade-off: fine-tune a model for one specific…
Code Generation
ICPC-Eval: Stress-Test LLM Reasoning with Real-World Competitive Programming Challenges 739
Evaluating the true reasoning capabilities of large language models (LLMs) in coding has long been hampered by benchmarks that are…
DiffuCoder: Generate Better Code with Iterative, Non-Autoregressive Diffusion Models 745
If you’re evaluating next-generation code generation tools, you’ve likely worked with autoregressive (AR) large language models—systems that build code one…
CodeGeeX: Open-Source Multilingual Code Generation That Boosts Developer Productivity Across 23 Languages 8713
For software teams working across multiple programming languages—or developers tired of vendor lock-in with proprietary AI coding tools—CodeGeeX offers a…
PRIME: Boost LLM Reasoning with Token-Level Rewards—No Step-by-Step Labels Needed 1783
If you’re working to improve large language models (LLMs) on hard reasoning tasks—like math problem solving or competitive programming—you’ve likely…
DeepCode: Turn Research Papers and Text into Production-Ready Code—Faster Than Human Experts 12706
Imagine being able to feed a research paper, a technical specification, or even a rough product description into a system—and…
aiXcoder-7B: High-Accuracy Code Completion in a Lightweight 7B Model for Real-Time Developer Workflows 2274
aiXcoder-7B is a 7-billion-parameter open-source large language model (LLM) purpose-built for code processing. Unlike larger models that trade inference speed…
DeepSeek-V3: A High-Performance, Cost-Efficient MoE Language Model That Delivers Closed-Source Power with Open-Source Flexibility 100738
For technical decision-makers evaluating large language models (LLMs) for real-world applications, balancing raw capability, inference cost, training efficiency, and deployment…
WizardCoder: Open-Source Code LLM That Outperforms ChatGPT and Gemini in Code Generation 9472
WizardCoder is a state-of-the-art open-source Code Large Language Model (Code LLM) that delivers exceptional performance on code generation tasks—often surpassing…
SWE-Lancer: Benchmark Real-World Freelance Coding Tasks to Measure LLMs’ True Engineering Value 1438
Evaluating large language models (LLMs) on synthetic coding benchmarks often fails to reflect their real-world utility. Enter SWE-Lancer—a rigorously constructed…