Evaluating large language models (LLMs) has become increasingly challenging. Traditional benchmarks—like MMLU, GSM8K, or Big-Bench Hard—are static, fixed in complexity,…
Caption Anything: Interactive, Multimodal Image Captioning Controlled by You 1770
Traditional image captioning systems produce static, one-size-fits-all descriptions—often generic, inflexible, and disconnected from actual user intent. What if you could…
OmniParser V2: One Unified Model for Text Spotting, Table Recognition, and Document Understanding 1800
In today’s data-driven world, businesses and researchers routinely process documents—scanned invoices, forms, tables, and receipts—to extract structured information. Traditionally, this…
ManimML: Animate Machine Learning Architectures Directly from Code—No Design Skills Needed 3269
As machine learning models grow increasingly complex—from deep convolutional networks to attention-based architectures—the ability to clearly communicate how they work…
Code-Optimise: Boost Code Correctness and Runtime Efficiency Without Trade-offs 2692
Modern code language models (CLMs) excel at generating functionally correct programs—but often at the cost of runtime efficiency. Conversely, efforts…
FederatedScope-LLM: Collaboratively Fine-Tune Large Language Models Without Sharing Private Data 1491
In today’s data-sensitive world, organizations increasingly want to harness the power of large language models (LLMs) while complying with strict…
HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs That Solves Multi-Hop Reasoning and Continual Knowledge Integration 3056
Retrieval-Augmented Generation (RAG) has become a go-to architecture for grounding large language models (LLMs) in external knowledge. Yet, even the…
DiffBIR: Unified Blind Image Restoration with Realistic Detail Recovery Across Super-Resolution, Face Enhancement, and Denoising 3971
Blind image restoration—recovering high-quality images from degraded inputs without knowing the exact type or severity of degradation—is a longstanding challenge…
Bi’an: Detect RAG Hallucinations Accurately with a Bilingual Benchmark and Lightweight Judge Models 8343
Retrieval-Augmented Generation (RAG) has become a go-to strategy for grounding large language model (LLM) responses in real-world knowledge. By pulling…
MiniCPM-V 4.5: GPT-4o-Level Vision Intelligence in an 8B Open-Source Model for Real-World Multimodal Tasks 22368
Multimodal Large Language Models (MLLMs) promise to transform how machines understand images, videos, and text—but most top-performing models come with…