PaperCodex

HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs That Solves Multi-Hop Reasoning and Continual Knowledge Integration 3056

Retrieval-Augmented Generation (RAG) has become a go-to architecture for grounding large language models (LLMs) in external knowledge. Yet, even the…

12/19/2025Continual Knowledge Integration, Multi-hop Question Answering, Retrieval-Augmented Generation

DiffBIR: Unified Blind Image Restoration with Realistic Detail Recovery Across Super-Resolution, Face Enhancement, and Denoising 3971

Blind image restoration—recovering high-quality images from degraded inputs without knowing the exact type or severity of degradation—is a longstanding challenge…

12/19/2025Blind Image Restoration, Face Restoration, Image Super-resolution

Bi’an: Detect RAG Hallucinations Accurately with a Bilingual Benchmark and Lightweight Judge Models 8343

Retrieval-Augmented Generation (RAG) has become a go-to strategy for grounding large language model (LLM) responses in real-world knowledge. By pulling…

12/19/2025Factuality Evaluation, Hallucination Detection, Retrieval-Augmented Generation

MiniCPM-V 4.5: GPT-4o-Level Vision Intelligence in an 8B Open-Source Model for Real-World Multimodal Tasks 22368

Multimodal Large Language Models (MLLMs) promise to transform how machines understand images, videos, and text—but most top-performing models come with…

12/19/2025Efficient MLLM Deployment, Multimodal Reasoning, Vision-language Understanding

VGGT: One Model to Reconstruct 3D Scenes Instantly—No Post-Processing Required 11917

Reconstructing accurate 3D geometry from 2D images has long been a fragmented, multi-step process—requiring separate models for camera pose estimation,…

12/19/20253D Reconstruction, Camera Pose Estimation, Multi-view Geometry

VBench: The Definitive Benchmark Suite for Evaluating Realism and Faithfulness in AI-Generated Videos 1364

As AI-generated videos grow increasingly convincing—featuring smooth motion, vivid aesthetics, and coherent narratives—a critical question emerges: How do we reliably…

12/19/2025Intrinsic Faithfulness Benchmarking, Multimodal Model Assessment, Video Generation Evaluation

Visual-RFT: Boost Vision-Language Model Performance with Minimal Data Using Reinforcement Fine-Tuning 2276

When labeled visual data is scarce—think dozens or hundreds of examples per category—traditional supervised fine-tuning (SFT) often falls short. Enter…

12/19/2025Few-shot Object Detection, Fine-grained Image Classification, Visual Reasoning Grounding

EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality 2049

For teams deploying large language models (LLMs) in production—whether for chatbots, reasoning APIs, or batch processing—latency and inference cost are…

12/19/2025Efficient Language Model Serving, LLM Inference Acceleration, Speculative Decoding

AI-Researcher: Automate End-to-End AI Research from Idea to Publication 3753

Scientific research in artificial intelligence is increasingly complex, time-consuming, and resource-intensive. From synthesizing hundreds of papers to prototyping novel algorithms…

12/19/2025Algorithm Prototyping, Autonomous AI Research, Scientific Manuscript Generation

Less-to-More Generalization: Unlock Controllable, Consistent Multi-Subject Image Generation with UNO 1337

Subject-driven image generation—where users provide one or more reference images of specific objects to guide the creation of new scenes—is…

12/19/2025Controllable Diffusion Models, Multi-subject Image Synthesis, Subject-driven Image Generation