Skip to content

PaperCodex

Subscribe
D-FINE: Real-Time Object Detection with DETR-Level Accuracy and No Inference Overhead

D-FINE: Real-Time Object Detection with DETR-Level Accuracy and No Inference Overhead 2756

Object detection has long faced a fundamental trade-off: high accuracy or real-time speed—but rarely both. Enter D-FINE, a breakthrough real-time…

12/22/2025DETR-based Models, Object Detection, Real-Time Inference
ClearerVoice-Studio: A Practical, All-in-One Toolkit for Real-World Speech Enhancement, Separation, and Speaker Extraction

ClearerVoice-Studio: A Practical, All-in-One Toolkit for Real-World Speech Enhancement, Separation, and Speaker Extraction 3717

In today’s audio-rich digital landscape—spanning call centers, video conferencing, voice assistants, and multimedia content—clean, high-quality speech isn’t a luxury; it’s…

12/20/2025Speaker Extraction, Speech Enhancement, Speech Separation
OpenSTL: A Standardized, Reproducible Benchmark for Spatio-Temporal Forecasting Across Video, Weather, and Traffic Domains

OpenSTL: A Standardized, Reproducible Benchmark for Spatio-Temporal Forecasting Across Video, Weather, and Traffic Domains 1030

Spatio-temporal predictive learning aims to forecast future states—like video frames, weather maps, or traffic patterns—based solely on past observations, typically…

12/20/2025Spatio-temporal Forecasting, Time-series Forecasting, Video Prediction
AirSLAM: Robust Visual SLAM for Real-World Lighting Changes – Point-Line Fusion, Real-Time Speed, and Embedded Deployment

AirSLAM: Robust Visual SLAM for Real-World Lighting Changes – Point-Line Fusion, Real-Time Speed, and Embedded Deployment 1101

Imagine deploying an autonomous robot in a warehouse that shifts from bright daylight to dim artificial lighting—or a drone navigating…

12/19/2025Illumination-Robust Localization, Point-Line Feature Fusion, Visual SLAM
DEIM: Slash DETR Training Time by 50% Without Sacrificing Accuracy for Real-Time Object Detection

DEIM: Slash DETR Training Time by 50% Without Sacrificing Accuracy for Real-Time Object Detection 1348

Real-time object detection has become a cornerstone of modern computer vision applications—from autonomous vehicles and robotics to industrial inspection and…

12/19/2025DETR Acceleration, Real-time Object Detection, Transformer-based Detection
Instruction Pre-Training: Boost Language Model Performance from Day One with Supervised Multitask Pre-Training

Instruction Pre-Training: Boost Language Model Performance from Day One with Supervised Multitask Pre-Training 4150

Traditional language model (LM) development follows a two-stage process: unsupervised pre-training on massive raw text corpora, followed by instruction tuning…

12/19/2025Instruction Tuning, Language Model Pre-training, Multitask Learning
InstantStyle: Effortless, Tuning-Free Style Preservation for Text-to-Image Generation

InstantStyle: Effortless, Tuning-Free Style Preservation for Text-to-Image Generation 1969

InstantStyle is a breakthrough framework that enables high-fidelity, style-consistent image generation without requiring any model retraining or per-image tuning. Built…

12/19/2025Image Stylization, Style Transfer, Text-to-Image Generation
InternGPT: Solve Vision-Centric Tasks with Clicks, Scribbles, and ChatGPT-Level Reasoning

InternGPT: Solve Vision-Centric Tasks with Clicks, Scribbles, and ChatGPT-Level Reasoning 3221

In today’s AI landscape, large language models (LLMs) like ChatGPT have transformed how we interact with software—through natural language. But…

12/19/2025Interactive Image Editing, Multimodal Reasoning, vision-language modeling
Marco-o1: Open-Source Reasoning Models That Reduce Hallucination and Over-Thinking in Complex Tasks

Marco-o1: Open-Source Reasoning Models That Reduce Hallucination and Over-Thinking in Complex Tasks 1528

As large reasoning models (LRMs) like OpenAI’s o1 demonstrate unprecedented capabilities in math, code, and planning, a critical gap remains:…

12/19/2025Agentic Planning, Chain-of-thought Distillation, Reasoning Models
DragDiffusion: Precise, Interactive Image Editing for Real and AI-Generated Photos Using Diffusion Models

DragDiffusion: Precise, Interactive Image Editing for Real and AI-Generated Photos Using Diffusion Models 1234

DragDiffusion is an open-source framework that brings pixel-precise, point-based image manipulation to both real-world photographs and AI-generated images—without requiring users…

12/19/2025Diffusion Models, Image Editing, Interactive Manipulation

Posts pagination

Previous 1 … 24 25 26 … 43 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex