Skip to content

PaperCodex

Subscribe
HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs That Solves Multi-Hop Reasoning and Continual Knowledge Integration

HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs That Solves Multi-Hop Reasoning and Continual Knowledge Integration 3056

Retrieval-Augmented Generation (RAG) has become a go-to architecture for grounding large language models (LLMs) in external knowledge. Yet, even the…

12/19/2025Continual Knowledge Integration, Multi-hop Question Answering, Retrieval-Augmented Generation
DiffBIR: Unified Blind Image Restoration with Realistic Detail Recovery Across Super-Resolution, Face Enhancement, and Denoising

DiffBIR: Unified Blind Image Restoration with Realistic Detail Recovery Across Super-Resolution, Face Enhancement, and Denoising 3971

Blind image restoration—recovering high-quality images from degraded inputs without knowing the exact type or severity of degradation—is a longstanding challenge…

12/19/2025Blind Image Restoration, Face Restoration, Image Super-resolution
Bi’an: Detect RAG Hallucinations Accurately with a Bilingual Benchmark and Lightweight Judge Models

Bi’an: Detect RAG Hallucinations Accurately with a Bilingual Benchmark and Lightweight Judge Models 8343

Retrieval-Augmented Generation (RAG) has become a go-to strategy for grounding large language model (LLM) responses in real-world knowledge. By pulling…

12/19/2025Factuality Evaluation, Hallucination Detection, Retrieval-Augmented Generation
MiniCPM-V 4.5: GPT-4o-Level Vision Intelligence in an 8B Open-Source Model for Real-World Multimodal Tasks

MiniCPM-V 4.5: GPT-4o-Level Vision Intelligence in an 8B Open-Source Model for Real-World Multimodal Tasks 22368

Multimodal Large Language Models (MLLMs) promise to transform how machines understand images, videos, and text—but most top-performing models come with…

12/19/2025Efficient MLLM Deployment, Multimodal Reasoning, Vision-language Understanding
VGGT: One Model to Reconstruct 3D Scenes Instantly—No Post-Processing Required

VGGT: One Model to Reconstruct 3D Scenes Instantly—No Post-Processing Required 11917

Reconstructing accurate 3D geometry from 2D images has long been a fragmented, multi-step process—requiring separate models for camera pose estimation,…

12/19/20253D Reconstruction, Camera Pose Estimation, Multi-view Geometry
VBench: The Definitive Benchmark Suite for Evaluating Realism and Faithfulness in AI-Generated Videos

VBench: The Definitive Benchmark Suite for Evaluating Realism and Faithfulness in AI-Generated Videos 1364

As AI-generated videos grow increasingly convincing—featuring smooth motion, vivid aesthetics, and coherent narratives—a critical question emerges: How do we reliably…

12/19/2025Intrinsic Faithfulness Benchmarking, Multimodal Model Assessment, Video Generation Evaluation
Visual-RFT: Boost Vision-Language Model Performance with Minimal Data Using Reinforcement Fine-Tuning

Visual-RFT: Boost Vision-Language Model Performance with Minimal Data Using Reinforcement Fine-Tuning 2276

When labeled visual data is scarce—think dozens or hundreds of examples per category—traditional supervised fine-tuning (SFT) often falls short. Enter…

12/19/2025Few-shot Object Detection, Fine-grained Image Classification, Visual Reasoning Grounding
EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality

EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality 2049

For teams deploying large language models (LLMs) in production—whether for chatbots, reasoning APIs, or batch processing—latency and inference cost are…

12/19/2025Efficient Language Model Serving, LLM Inference Acceleration, Speculative Decoding
AI-Researcher: Automate End-to-End AI Research from Idea to Publication

AI-Researcher: Automate End-to-End AI Research from Idea to Publication 3753

Scientific research in artificial intelligence is increasingly complex, time-consuming, and resource-intensive. From synthesizing hundreds of papers to prototyping novel algorithms…

12/19/2025Algorithm Prototyping, Autonomous AI Research, Scientific Manuscript Generation
Less-to-More Generalization: Unlock Controllable, Consistent Multi-Subject Image Generation with UNO

Less-to-More Generalization: Unlock Controllable, Consistent Multi-Subject Image Generation with UNO 1337

Subject-driven image generation—where users provide one or more reference images of specific objects to guide the creation of new scenes—is…

12/19/2025Controllable Diffusion Models, Multi-subject Image Synthesis, Subject-driven Image Generation

Posts pagination

Previous 1 … 37 38 39 … 53 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex