PaperCodex

OOTDiffusion: High-Fidelity, Controllable Virtual Try-On Without Garment Warping 6482

OOTDiffusion represents a significant leap forward in image-based virtual try-on (VTON) technology. Built on the foundation of pretrained latent diffusion…

12/26/2025Diffusion Models, Image Generation, Virtual Try-on

AutoTrain: No-Code, Multi-Modal Model Training for Technical Decision-Makers 4541

In today’s fast-moving AI landscape, fine-tuning state-of-the-art models on custom data is no longer a luxury—it’s a necessity for building…

12/26/2025Image Classification, LLM Fine-tuning, Text Classification

LISA: Segment Anything by Understanding What You Really Mean 2523

Imagine asking a computer vision system to “segment the object that makes the woman stand higher” or “show me the…

12/26/2025Multimodal Reasoning, Reasoning Segmentation, Visual Question Answering

PyABSA: Reproducible, Modular Aspect-Based Sentiment Analysis for Practitioners and Researchers 1076

Aspect-Based Sentiment Analysis (ABSA) has become essential for extracting fine-grained opinions from text—such as determining whether a customer loves a…

12/26/2025Aspect Term Extraction, Aspect-Based Sentiment Analysis, Sentiment Classification

YOLOv6: Real-Time Object Detection Optimized for Speed, Accuracy, and Industrial Deployment 5869

YOLOv6 is a high-performance, single-stage object detection framework developed by Meituan with a strong emphasis on real-world industrial applications. Unlike…

12/26/2025Edge AI, Object Detection, Real-Time Inference

MME: The First Comprehensive Benchmark to Objectively Evaluate Multimodal Large Language Models 17004

Multimodal Large Language Models (MLLMs) have captured the imagination of researchers and developers alike—promising capabilities like generating poetry from images,…

12/26/2025Multimodal Evaluation, Multimodal Reasoning, vision-language modeling

OpenAGI: Build Smarter AI Agents by Combining LLMs with Domain Experts 2224

In today’s AI landscape, building systems that handle real-world complexity often means stitching together language models, specialized tools, APIs, and…

12/26/2025AI Agent Orchestration, LLM-enhanced Automation, Multi-tool Reasoning

BEVFusion: Unified Bird’s-Eye View Fusion for Accurate, Efficient Multi-Sensor Perception in Autonomous Driving 2943

Building reliable perception systems for autonomous driving demands more than just collecting data from cameras and LiDARs—it requires intelligently fusing…

12/26/20253D Object Detection, BEV Map Segmentation, Multi-sensor Fusion

Magic Clothing: Generate Photorealistic Outfits with Exact Garment Control and Text Guidance 1535

Magic Clothing is a cutting-edge solution for a long-standing challenge in AI-powered visual content creation: how to generate realistic human…

12/26/2025Controllable Image Generation, Fashion-aware Diffusion Models, Garment-driven Image Synthesis