PaperCodex

LMDrive: The First Language-Guided, Closed-Loop Autonomous Driving System for Human-Centric Navigation 827

Autonomous driving has made remarkable strides, yet it still falters in complex urban settings—especially when confronted with rare, ambiguous, or…

01/13/2026Autonomous Driving, Embodied AI, Language-guided Control

ControlVideo: Training-Free, Controllable Text-to-Video Generation with Consistent Motion and Structure 851

Generating high-quality videos from text has long been a challenging frontier in generative AI—especially compared to the rapid advances in…

01/13/2026Controllable Video Synthesis, Structure-conditioned Generative Models, Text-to-Video Generation

Matcher: One-Shot Segmentation Without Training—Unlock Flexible, Label-Free Perception for Real-World Applications 522

In modern computer vision workflows, deploying accurate segmentation models often demands large annotated datasets, task-specific architectures, and costly retraining—barriers that…

01/13/2026One-shot Segmentation, Open-world Perception, Zero-shot Learning

SAD: Geometry-Aware RGBD Segmentation That Fixes SAM’s Over-Segmentation Problem 859

The Segment Anything Model (SAM) revolutionized 2D image segmentation by enabling zero-shot, promptable mask generation from RGB images. However, SAM’s…

01/13/20263D Panoptic Segmentation, RGBD Segmentation, Zero-shot Semantic Segmentation

Prompt-Free Diffusion: Generate Images Without Writing a Single Text Prompt 757

Text-to-image (T2I) diffusion models have revolutionized creative workflows—but they come with a hidden bottleneck: prompt engineering. Describing an image in…

01/13/2026Prompt-free Diffusion, Text-to-Image Generation, Visual-conditioned Image Synthesis

Uni-ControlNet: Unified Visual Control for Text-to-Image Generation Without Retraining Everything 664

Generating high-quality images from text prompts has become remarkably powerful thanks to diffusion models like Stable Diffusion. Yet, for many…

01/13/2026Controllable Diffusion Models, Multimodal Conditioning, Text-to-Image Generation

FuseChat: Build Smarter, Smaller Chatbots by Fusing Top Open-Source LLMs—No Training From Scratch Needed 584

In today’s fast-moving AI landscape, teams need high-performing chat models that are both capable and cost-efficient. Yet training large language…

01/13/2026Chatbot Deployment, Instruction Following, Knowledge Fusion

RAGChecker: Fine-Grained Diagnostics for Reliable Retrieval-Augmented Generation Evaluation 999

Retrieval-Augmented Generation (RAG) has become a cornerstone of modern AI applications, enabling systems to answer questions by combining external knowledge…

01/13/2026Claim-level Factuality Assessment, RAG Diagnostics, Retrieval-Augmented Generation Evaluation