Visual-Inertial Navigation Systems (VINS) are critical for applications like drones, robotics, and augmented reality, where precise real-time localization is required…
AgentLite: Build Task-Oriented LLM Agents Fast—Without the Framework Bloat 634
Developing effective, task-oriented agents powered by large language models (LLMs) has become a priority for researchers and developers alike. However,…
RepoAgent: Auto-Generate & Maintain Repository-Level Code Docs with LLMs 801
Keeping code documentation up to date is one of the most universally acknowledged yet consistently neglected tasks in software development.…
Sparse VideoGen2: Accelerate Video Diffusion Models 2.3x Without Retraining or Quality Loss 596
Video generation using diffusion transformers (DiTs) has reached remarkable visual fidelity—but at a steep computational cost. The quadratic complexity of…
SSLRec: A Unified, Plug-and-Play Framework for Self-Supervised Recommendation Systems 535
Recommender systems are foundational to modern digital experiences—from streaming platforms to e-commerce—but they face a persistent challenge: user interaction data…
MINIMA: Universal Cross-Modality Image Matching Without Custom Models for Every Sensor 544
In real-world computer vision systems—whether for autonomous vehicles, remote sensing, or robotic inspection—images rarely come from a single type of…
RL4CO: Accelerate Reinforcement Learning for Combinatorial Optimization with a Unified, Reproducible Benchmark 757
Combinatorial optimization (CO) lies at the heart of countless real-world challenges—from vehicle routing and job scheduling to chip design and…
DeepfakeBench: The First Standardized Benchmark for Fair, Reproducible Deepfake Detection Evaluation 928
Deepfake detection is rapidly becoming a critical component of digital trust and media integrity. Yet despite growing interest and investment,…
SARDet-100K: The First COCO-Scale Open Benchmark for Multi-Class SAR Object Detection 660
Synthetic Aperture Radar (SAR) imaging offers a unique advantage: it works reliably in all weather conditions, day or night, making…
OneLLM: Unify Images, Audio, Video, Sensors, and Even Brain Signals into One Language Model 665
Multimodal AI is no longer just about images and text—it’s about seamlessly blending diverse data streams like audio, video, 3D…