Remote sensing imagery—captured from satellites, drones, or aircraft—presents unique challenges for computer vision systems. Objects are often small, densely packed,…
Semantic Segmentation
Lite-HRNet: High-Accuracy Human Pose Estimation and Semantic Segmentation with Minimal Compute 894
When building real-time vision applications for mobile, embedded, or edge devices, developers often face a tough trade-off: accuracy versus efficiency.…
IDRNet: Boost Semantic Segmentation Accuracy with Smarter Context Modeling—No Heavy Priors Required 876
If you’re building computer vision systems that rely on pixel-perfect understanding—like autonomous driving, medical imaging analysis, or retail scene parsing—you’ve…
VMamba: A Linear-Time Vision Backbone for High-Resolution, Scalable Computer Vision Tasks 2969
In the rapidly evolving landscape of computer vision, model efficiency and scalability are no longer optional—they’re essential. Enter VMamba, a…
OMG-Seg: One Unified Model for All Segmentation Tasks—No More Fragmented Pipelines 1338
For years, computer vision practitioners have juggled a patchwork of specialized models to tackle different segmentation tasks—semantic, instance, panoptic, video,…
MambaVision: Achieve SOTA Image Classification & Downstream Vision Tasks with Hybrid Mamba-Transformer Efficiency 1946
If you’re building computer vision systems that demand both high accuracy and real-world efficiency—without getting bogged down in architectural complexity—MambaVision…
FlexiViT: One Vision Transformer for All Patch Sizes—Deploy Faster or More Accurate Models Without Retraining 3276
Vision Transformers (ViTs) have become a cornerstone of modern computer vision, offering strong performance across a wide range of tasks.…
FastViT: Achieve State-of-the-Art Speed and Accuracy for Vision Tasks on Mobile and Edge Devices 1974
FastViT is a high-performance hybrid vision transformer designed to deliver exceptional speed and accuracy—especially on resource-constrained platforms like mobile phones…
AM-RADIO: Unify Vision Foundation Models into One High-Performance Backbone for Multimodal, Segmentation, and Detection Tasks 1357
In modern computer vision, practitioners often juggle multiple foundation models—CLIP for vision-language alignment, DINOv2 for dense feature extraction, and SAM…
CARAFE: Boost Dense Prediction Accuracy with Content-Aware, Lightweight Feature Upsampling 32164
Feature upsampling is a critical but often overlooked component in modern computer vision pipelines. Whether you’re building an object detector,…