When it comes to deploying multimodal large language models (MLLMs) in real-world applications—especially on cost-sensitive or edge devices—lightweight models are…
ViTPose: High-Accuracy, Scalable Pose Estimation Without Complex Custom Designs 1859
Human and animal pose estimation has long relied on hand-crafted convolutional architectures, intricate post-processing, or task-specific modules. ViTPose changes that…
PP-HumanSeg: Real-Time, Connectivity-Aware Human Portrait Segmentation for Video Conferencing and Edge Applications 9242
In the era of remote collaboration, virtual meetings have become the norm—making clean, real-time human portrait segmentation essential for professional…
StrongSORT: A High-Performance, Plug-and-Play Multi-Object Tracker for Real-World Video Applications 3832
Multi-object tracking (MOT) is a cornerstone of modern computer vision systems—powering everything from autonomous vehicles to retail analytics and security…
D-FINE: Real-Time Object Detection with DETR-Level Accuracy and No Inference Overhead 2756
Object detection has long faced a fundamental trade-off: high accuracy or real-time speed—but rarely both. Enter D-FINE, a breakthrough real-time…
ClearerVoice-Studio: A Practical, All-in-One Toolkit for Real-World Speech Enhancement, Separation, and Speaker Extraction 3717
In today’s audio-rich digital landscape—spanning call centers, video conferencing, voice assistants, and multimedia content—clean, high-quality speech isn’t a luxury; it’s…
OpenSTL: A Standardized, Reproducible Benchmark for Spatio-Temporal Forecasting Across Video, Weather, and Traffic Domains 1030
Spatio-temporal predictive learning aims to forecast future states—like video frames, weather maps, or traffic patterns—based solely on past observations, typically…
AirSLAM: Robust Visual SLAM for Real-World Lighting Changes – Point-Line Fusion, Real-Time Speed, and Embedded Deployment 1101
Imagine deploying an autonomous robot in a warehouse that shifts from bright daylight to dim artificial lighting—or a drone navigating…
DEIM: Slash DETR Training Time by 50% Without Sacrificing Accuracy for Real-Time Object Detection 1348
Real-time object detection has become a cornerstone of modern computer vision applications—from autonomous vehicles and robotics to industrial inspection and…
Instruction Pre-Training: Boost Language Model Performance from Day One with Supervised Multitask Pre-Training 4150
Traditional language model (LM) development follows a two-stage process: unsupervised pre-training on massive raw text corpora, followed by instruction tuning…