Handling long input sequences—ranging from tens of thousands to over a million tokens—is no longer a theoretical benchmark but a…
TextBox 2.0: A Unified Library for Rapid Text Generation with Pre-Trained Language Models 1096
If you’ve ever struggled to compare BART, T5, and a custom Chinese language model on summarization, translation, or dialogue generation—only…
ZoeDepth: Metric-Accurate, Zero-Shot Monocular Depth Estimation for Real-World Applications 2755
Depth estimation from a single RGB image—monocular depth estimation—is a foundational task in computer vision with far-reaching implications in robotics,…
VAD: Vectorized End-to-End Autonomous Driving for Faster, Safer Planning 1159
Autonomous driving systems must balance accuracy, safety, and real-time performance. Traditional approaches often rely on dense rasterized representations of the…
InfiniteYou: High-Fidelity Identity-Preserving Image Generation with Flexible Prompt Control 2652
Personalized image generation has long struggled with a fundamental trade-off: how to maintain strong identity fidelity while enabling flexible, high-quality…
OmniDocBench: A Real-World, Fine-Grained Benchmark for Fair and Comprehensive PDF Document Parsing Evaluation 1279
Evaluating document parsing systems has long been a frustrating exercise in inconsistency. Many existing benchmarks focus narrowly on clean academic…
detrex: A Unified, Modular Benchmark for Detection Transformers—Accelerate Object Detection, Segmentation, and Pose Estimation Research 2250
If you’re evaluating object detection frameworks for a new computer vision project, you’ve likely encountered the rise of DETR (Detection…
RepViT: Real-Time Mobile Vision with Pure CNN Speed and ViT-Level Accuracy 1009
In the world of on-device computer vision, the tension between speed and accuracy has long defined what’s possible. Engineers building…
UniDepthV2: Zero-Shot Monocular Metric Depth Estimation That Works Across Real-World Domains 1091
Monocular metric depth estimation (MMDE)—the task of predicting real-world depth values from a single RGB image—is foundational for 3D perception…
REINFORCE++: A Critic-Free RLHF Algorithm for Faster, More Robust LLM Alignment 8585
Aligning large language models (LLMs) with human preferences is essential for building safe, helpful, and reliable AI systems. Reinforcement Learning…