Retrieval-Augmented Generation (RAG) has become a go-to architecture for grounding large language models (LLMs) in external knowledge. Yet, even the…
DiffBIR: Unified Blind Image Restoration with Realistic Detail Recovery Across Super-Resolution, Face Enhancement, and Denoising 3971
Blind image restoration—recovering high-quality images from degraded inputs without knowing the exact type or severity of degradation—is a longstanding challenge…
Bi’an: Detect RAG Hallucinations Accurately with a Bilingual Benchmark and Lightweight Judge Models 8343
Retrieval-Augmented Generation (RAG) has become a go-to strategy for grounding large language model (LLM) responses in real-world knowledge. By pulling…
MiniCPM-V 4.5: GPT-4o-Level Vision Intelligence in an 8B Open-Source Model for Real-World Multimodal Tasks 22368
Multimodal Large Language Models (MLLMs) promise to transform how machines understand images, videos, and text—but most top-performing models come with…
VGGT: One Model to Reconstruct 3D Scenes Instantly—No Post-Processing Required 11917
Reconstructing accurate 3D geometry from 2D images has long been a fragmented, multi-step process—requiring separate models for camera pose estimation,…
VBench: The Definitive Benchmark Suite for Evaluating Realism and Faithfulness in AI-Generated Videos 1364
As AI-generated videos grow increasingly convincing—featuring smooth motion, vivid aesthetics, and coherent narratives—a critical question emerges: How do we reliably…
Visual-RFT: Boost Vision-Language Model Performance with Minimal Data Using Reinforcement Fine-Tuning 2276
When labeled visual data is scarce—think dozens or hundreds of examples per category—traditional supervised fine-tuning (SFT) often falls short. Enter…
EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality 2049
For teams deploying large language models (LLMs) in production—whether for chatbots, reasoning APIs, or batch processing—latency and inference cost are…
AI-Researcher: Automate End-to-End AI Research from Idea to Publication 3753
Scientific research in artificial intelligence is increasingly complex, time-consuming, and resource-intensive. From synthesizing hundreds of papers to prototyping novel algorithms…
Less-to-More Generalization: Unlock Controllable, Consistent Multi-Subject Image Generation with UNO 1337
Subject-driven image generation—where users provide one or more reference images of specific objects to guide the creation of new scenes—is…