Skip to content

PaperCodex

Subscribe

Image Classification

UniRepLKNet: A Universal Large-Kernel ConvNet for Faster, Stronger, and Truly Multimodal AI

UniRepLKNet: A Universal Large-Kernel ConvNet for Faster, Stronger, and Truly Multimodal AI 1053

In the era of Vision Transformers and increasingly complex multimodal architectures, convolutional neural networks (ConvNets) have often been written off…

01/04/2026Image Classification, Multimodal Perception, Time-series Forecasting
VMamba: A Linear-Time Vision Backbone for High-Resolution, Scalable Computer Vision Tasks

VMamba: A Linear-Time Vision Backbone for High-Resolution, Scalable Computer Vision Tasks 2969

In the rapidly evolving landscape of computer vision, model efficiency and scalability are no longer optional—they’re essential. Enter VMamba, a…

12/26/2025Image Classification, Object Detection, Semantic Segmentation
MambaVision: Achieve SOTA Image Classification & Downstream Vision Tasks with Hybrid Mamba-Transformer Efficiency

MambaVision: Achieve SOTA Image Classification & Downstream Vision Tasks with Hybrid Mamba-Transformer Efficiency 1946

If you’re building computer vision systems that demand both high accuracy and real-world efficiency—without getting bogged down in architectural complexity—MambaVision…

12/26/2025Image Classification, Object Detection, Semantic Segmentation
AutoTrain: No-Code, Multi-Modal Model Training for Technical Decision-Makers

AutoTrain: No-Code, Multi-Modal Model Training for Technical Decision-Makers 4541

In today’s fast-moving AI landscape, fine-tuning state-of-the-art models on custom data is no longer a luxury—it’s a necessity for building…

12/26/2025Image Classification, LLM Fine-tuning, Text Classification
MambaOut: High-Accuracy Vision Models Without the Mamba Overhead

MambaOut: High-Accuracy Vision Models Without the Mamba Overhead 2609

The vision community has recently seen a surge in adopting sequence modeling architectures—especially Mamba—for image tasks. Inspired by its linear…

12/26/2025Efficient Deep Learning, Image Classification, Vision Backbone
FlexiViT: One Vision Transformer for All Patch Sizes—Deploy Faster or More Accurate Models Without Retraining

FlexiViT: One Vision Transformer for All Patch Sizes—Deploy Faster or More Accurate Models Without Retraining 3276

Vision Transformers (ViTs) have become a cornerstone of modern computer vision, offering strong performance across a wide range of tasks.…

12/22/2025Image Classification, Image-text Retrieval, Semantic Segmentation
FastViT: Achieve State-of-the-Art Speed and Accuracy for Vision Tasks on Mobile and Edge Devices

FastViT: Achieve State-of-the-Art Speed and Accuracy for Vision Tasks on Mobile and Edge Devices 1974

FastViT is a high-performance hybrid vision transformer designed to deliver exceptional speed and accuracy—especially on resource-constrained platforms like mobile phones…

12/22/2025Image Classification, Object Detection, Semantic Segmentation
GhostNet: High-Accuracy Vision Models with Minimal Compute for Edge Deployment

GhostNet: High-Accuracy Vision Models with Minimal Compute for Edge Deployment 4355

Overview Deploying powerful computer vision models on resource-constrained devices—such as smartphones, IoT sensors, or drones—has long been a major engineering…

12/22/2025Edge AI, Image Classification, Object Detection
RepViT: Real-Time Mobile Vision with Pure CNN Speed and ViT-Level Accuracy

RepViT: Real-Time Mobile Vision with Pure CNN Speed and ViT-Level Accuracy 1009

In the world of on-device computer vision, the tension between speed and accuracy has long defined what’s possible. Engineers building…

12/22/2025Image Classification, Instance Segmentation, Mobile Vision
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex