Awesome Model Evaluation Papers and Source Codes

TinyLVLM-eHub: Fast, Lightweight Evaluation for Large Vision-Language Models Without Heavy Compute 539

As Large Vision-Language Models (LVLMs) grow increasingly capable—and increasingly complex—evaluating their multimodal reasoning, perception, and reliability has become a significant…

01/09/2026Model Evaluation, Multimodal Reasoning, Visual Question Answering

ICPC-Eval: Stress-Test LLM Reasoning with Real-World Competitive Programming Challenges 739

Evaluating the true reasoning capabilities of large language models (LLMs) in coding has long been hampered by benchmarks that are…

01/09/2026Algorithmic Reasoning, Code Generation, Model Evaluation