Skip to content

PaperCodex

Subscribe
InfiniteYou: High-Fidelity Identity-Preserving Image Generation with Flexible Prompt Control

InfiniteYou: High-Fidelity Identity-Preserving Image Generation with Flexible Prompt Control 2652

Personalized image generation has long struggled with a fundamental trade-off: how to maintain strong identity fidelity while enabling flexible, high-quality…

12/22/2025Identity-Preserving Image Generation, Personalized Diffusion Models, Text-to-image Synthesis
OmniDocBench: A Real-World, Fine-Grained Benchmark for Fair and Comprehensive PDF Document Parsing Evaluation

OmniDocBench: A Real-World, Fine-Grained Benchmark for Fair and Comprehensive PDF Document Parsing Evaluation 1279

Evaluating document parsing systems has long been a frustrating exercise in inconsistency. Many existing benchmarks focus narrowly on clean academic…

12/22/2025Document Parsing, Layout Analysis, Multimodal Document Understanding
detrex: A Unified, Modular Benchmark for Detection Transformers—Accelerate Object Detection, Segmentation, and Pose Estimation Research

detrex: A Unified, Modular Benchmark for Detection Transformers—Accelerate Object Detection, Segmentation, and Pose Estimation Research 2250

If you’re evaluating object detection frameworks for a new computer vision project, you’ve likely encountered the rise of DETR (Detection…

12/22/2025Instance Segmentation, Object Detection, Pose Estimation
RepViT: Real-Time Mobile Vision with Pure CNN Speed and ViT-Level Accuracy

RepViT: Real-Time Mobile Vision with Pure CNN Speed and ViT-Level Accuracy 1009

In the world of on-device computer vision, the tension between speed and accuracy has long defined what’s possible. Engineers building…

12/22/2025Image Classification, Instance Segmentation, Mobile Vision
UniDepthV2: Zero-Shot Monocular Metric Depth Estimation That Works Across Real-World Domains

UniDepthV2: Zero-Shot Monocular Metric Depth Estimation That Works Across Real-World Domains 1091

Monocular metric depth estimation (MMDE)—the task of predicting real-world depth values from a single RGB image—is foundational for 3D perception…

12/22/20253D Scene Reconstruction, Metric Depth Prediction, Monocular Depth Estimation
REINFORCE++: A Critic-Free RLHF Algorithm for Faster, More Robust LLM Alignment

REINFORCE++: A Critic-Free RLHF Algorithm for Faster, More Robust LLM Alignment 8585

Aligning large language models (LLMs) with human preferences is essential for building safe, helpful, and reliable AI systems. Reinforcement Learning…

12/22/2025Large Language Model Alignment, Reasoning With Chain-of-Thought, Reinforcement Learning From Human Feedback (RLHF)
Mini-Monkey: Fixing Fragmented Vision in Lightweight Multimodal Models with Smart Multi-Scale Cropping

Mini-Monkey: Fixing Fragmented Vision in Lightweight Multimodal Models with Smart Multi-Scale Cropping 1923

When it comes to deploying multimodal large language models (MLLMs) in real-world applications—especially on cost-sensitive or edge devices—lightweight models are…

12/22/2025Document Understanding, Multimodal Reasoning, Optical Character Recognition (OCR)
ViTPose: High-Accuracy, Scalable Pose Estimation Without Complex Custom Designs

ViTPose: High-Accuracy, Scalable Pose Estimation Without Complex Custom Designs 1859

Human and animal pose estimation has long relied on hand-crafted convolutional architectures, intricate post-processing, or task-specific modules. ViTPose changes that…

12/22/2025Animal Pose Estimation, Human Pose Estimation, Whole-Body Pose Estimation
PP-HumanSeg: Real-Time, Connectivity-Aware Human Portrait Segmentation for Video Conferencing and Edge Applications

PP-HumanSeg: Real-Time, Connectivity-Aware Human Portrait Segmentation for Video Conferencing and Edge Applications 9242

In the era of remote collaboration, virtual meetings have become the norm—making clean, real-time human portrait segmentation essential for professional…

12/22/2025Human Portrait Segmentation, Real-time Semantic Segmentation, Video Conferencing Background Replacement
StrongSORT: A High-Performance, Plug-and-Play Multi-Object Tracker for Real-World Video Applications

StrongSORT: A High-Performance, Plug-and-Play Multi-Object Tracker for Real-World Video Applications 3832

Multi-object tracking (MOT) is a cornerstone of modern computer vision systems—powering everything from autonomous vehicles to retail analytics and security…

12/22/2025MOT, Multi-Object Tracking, Video Object Tracking

Posts pagination

Previous 1 … 23 24 25 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex