Skip to content

PaperCodex

Subscribe
YOLOv6: Real-Time Object Detection Optimized for Speed, Accuracy, and Industrial Deployment

YOLOv6: Real-Time Object Detection Optimized for Speed, Accuracy, and Industrial Deployment 5869

YOLOv6 is a high-performance, single-stage object detection framework developed by Meituan with a strong emphasis on real-world industrial applications. Unlike…

12/26/2025Edge AI, Object Detection, Real-Time Inference
MME: The First Comprehensive Benchmark to Objectively Evaluate Multimodal Large Language Models

MME: The First Comprehensive Benchmark to Objectively Evaluate Multimodal Large Language Models 17004

Multimodal Large Language Models (MLLMs) have captured the imagination of researchers and developers alike—promising capabilities like generating poetry from images,…

12/26/2025Multimodal Evaluation, Multimodal Reasoning, vision-language modeling
OpenAGI: Build Smarter AI Agents by Combining LLMs with Domain Experts

OpenAGI: Build Smarter AI Agents by Combining LLMs with Domain Experts 2224

In today’s AI landscape, building systems that handle real-world complexity often means stitching together language models, specialized tools, APIs, and…

12/26/2025AI Agent Orchestration, LLM-enhanced Automation, Multi-tool Reasoning
Agent-E: Reliable, Hierarchical Web Automation Powered by Proven Agentic Design Principles

Agent-E: Reliable, Hierarchical Web Automation Powered by Proven Agentic Design Principles 1195

In today’s fast-paced digital landscape, automating browser-based workflows—from filling forms to comparing products—has become essential for both individuals and enterprises.…

12/26/2025Agentic Systems, Browser Navigation, Web Automation
BEVFusion: Unified Bird’s-Eye View Fusion for Accurate, Efficient Multi-Sensor Perception in Autonomous Driving

BEVFusion: Unified Bird’s-Eye View Fusion for Accurate, Efficient Multi-Sensor Perception in Autonomous Driving 2943

Building reliable perception systems for autonomous driving demands more than just collecting data from cameras and LiDARs—it requires intelligently fusing…

12/26/20253D Object Detection, BEV Map Segmentation, Multi-sensor Fusion
Magic Clothing: Generate Photorealistic Outfits with Exact Garment Control and Text Guidance

Magic Clothing: Generate Photorealistic Outfits with Exact Garment Control and Text Guidance 1535

Magic Clothing is a cutting-edge solution for a long-standing challenge in AI-powered visual content creation: how to generate realistic human…

12/26/2025Controllable Image Generation, Fashion-aware Diffusion Models, Garment-driven Image Synthesis
ESPnet-ST: Open-Source Toolkit for Offline, Simultaneous, and Speech-to-Speech Translation

ESPnet-ST: Open-Source Toolkit for Offline, Simultaneous, and Speech-to-Speech Translation 9641

In an increasingly multilingual and interconnected world, spoken language translation (SLT) has moved beyond academic curiosity to become a critical…

12/26/2025Simultaneous Speech Translation, Speech-to-Speech Translation, Speech-to-text Translation
Vocos: High-Quality, Real-Time Neural Vocoder Using Fourier Spectra for Efficient Audio Synthesis

Vocos: High-Quality, Real-Time Neural Vocoder Using Fourier Spectra for Efficient Audio Synthesis 1028

If you’re building or evaluating text-to-speech (TTS), voice cloning, or generative audio systems, the choice of neural vocoder can make…

12/26/2025Audio Synthesis, Neural Vocoding, Speech Generation
VideoMamba: Efficient Long- and Short-Term Video Understanding Without the Compute Overhead

VideoMamba: Efficient Long- and Short-Term Video Understanding Without the Compute Overhead 1044

Video understanding has long been bottlenecked by two competing demands: capturing fine-grained local motion while simultaneously modeling long-range temporal dependencies.…

12/26/2025Action Recognition, Video Understanding, Video-text Retrieval
MoE-LLaVA: High-Performance Vision-Language Understanding with Sparse, Efficient Inference

MoE-LLaVA: High-Performance Vision-Language Understanding with Sparse, Efficient Inference 2282

MoE-LLaVA (Mixture of Experts for Large Vision-Language Models) redefines efficiency in multimodal AI by delivering performance that rivals much larger…

12/26/2025Multimodal Reasoning, Object Hallucination Reduction, Visual Question Answering

Posts pagination

Previous 1 … 19 20 21 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex