Skip to content

PaperCodex

Subscribe
MambaVision: Achieve SOTA Image Classification & Downstream Vision Tasks with Hybrid Mamba-Transformer Efficiency

MambaVision: Achieve SOTA Image Classification & Downstream Vision Tasks with Hybrid Mamba-Transformer Efficiency 1946

If you’re building computer vision systems that demand both high accuracy and real-world efficiency—without getting bogged down in architectural complexity—MambaVision…

12/26/2025Image Classification, Object Detection, Semantic Segmentation
MobileVLM: High-Performance Vision-Language AI That Runs Fast and Privately on Mobile Devices

MobileVLM: High-Performance Vision-Language AI That Runs Fast and Privately on Mobile Devices 1314

MobileVLM is a purpose-built vision-language model (VLM) engineered from the ground up for on-device deployment on smartphones and edge hardware.…

12/26/2025Multimodal Reasoning, On-Device AI, Visual Question Answering
Versatile Diffusion: One Unified Model for Text-to-Image, Image-to-Text, and Creative Variations

Versatile Diffusion: One Unified Model for Text-to-Image, Image-to-Text, and Creative Variations 1334

In today’s fast-evolving AI landscape, most generative systems are built for a single task—whether that’s turning text into images, editing…

12/26/2025Image-to-text Captioning, Multimodal Diffusion, Text-to-Image Generation
Qwen2-VL: Process Any-Resolution Images and Videos with Human-Like Visual Understanding

Qwen2-VL: Process Any-Resolution Images and Videos with Human-Like Visual Understanding 17241

Vision-language models (VLMs) are increasingly essential for tasks that require joint understanding of images, videos, and text—ranging from document parsing…

12/26/2025Document Understanding, Multimodal Reasoning, vision-language modeling
CAMEL: Build Scalable, Autonomous Multi-Agent AI Systems Without Constant Human Oversight

CAMEL: Build Scalable, Autonomous Multi-Agent AI Systems Without Constant Human Oversight 15059

In today’s AI landscape, large language models (LLMs) excel at solving complex tasks—but only when carefully guided by humans. This…

12/26/2025Autonomous Task Automation, Multi-agent Systems, Synthetic Data Generation
ONE-PEACE: A Single Model for Vision, Audio, and Language with Zero Pretraining Dependencies

ONE-PEACE: A Single Model for Vision, Audio, and Language with Zero Pretraining Dependencies 1062

In today’s AI landscape, most multimodal systems are built by stitching together specialized models—separate vision encoders, audio processors, and language…

12/26/2025Cross-Modal Retrieval, Multimodal Representation Learning, Zero-shot Transfer Learning
OOTDiffusion: High-Fidelity, Controllable Virtual Try-On Without Garment Warping

OOTDiffusion: High-Fidelity, Controllable Virtual Try-On Without Garment Warping 6482

OOTDiffusion represents a significant leap forward in image-based virtual try-on (VTON) technology. Built on the foundation of pretrained latent diffusion…

12/26/2025Diffusion Models, Image Generation, Virtual Try-on
AutoTrain: No-Code, Multi-Modal Model Training for Technical Decision-Makers

AutoTrain: No-Code, Multi-Modal Model Training for Technical Decision-Makers 4541

In today’s fast-moving AI landscape, fine-tuning state-of-the-art models on custom data is no longer a luxury—it’s a necessity for building…

12/26/2025Image Classification, LLM Fine-tuning, Text Classification
LISA: Segment Anything by Understanding What You *Really* Mean

LISA: Segment Anything by Understanding What You *Really* Mean 2523

Imagine asking a computer vision system to “segment the object that makes the woman stand higher” or “show me the…

12/26/2025Multimodal Reasoning, Reasoning Segmentation, Visual Question Answering
PyABSA: Reproducible, Modular Aspect-Based Sentiment Analysis for Practitioners and Researchers

PyABSA: Reproducible, Modular Aspect-Based Sentiment Analysis for Practitioners and Researchers 1076

Aspect-Based Sentiment Analysis (ABSA) has become essential for extracting fine-grained opinions from text—such as determining whether a customer loves a…

12/26/2025Aspect Term Extraction, Aspect-Based Sentiment Analysis, Sentiment Classification

Posts pagination

Previous 1 … 18 19 20 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex