Object detection has long faced a fundamental trade-off: high accuracy or real-time speed—but rarely both. Enter D-FINE, a breakthrough real-time…
ClearerVoice-Studio: A Practical, All-in-One Toolkit for Real-World Speech Enhancement, Separation, and Speaker Extraction 3717
In today’s audio-rich digital landscape—spanning call centers, video conferencing, voice assistants, and multimedia content—clean, high-quality speech isn’t a luxury; it’s…
OpenSTL: A Standardized, Reproducible Benchmark for Spatio-Temporal Forecasting Across Video, Weather, and Traffic Domains 1030
Spatio-temporal predictive learning aims to forecast future states—like video frames, weather maps, or traffic patterns—based solely on past observations, typically…
AirSLAM: Robust Visual SLAM for Real-World Lighting Changes – Point-Line Fusion, Real-Time Speed, and Embedded Deployment 1101
Imagine deploying an autonomous robot in a warehouse that shifts from bright daylight to dim artificial lighting—or a drone navigating…
DEIM: Slash DETR Training Time by 50% Without Sacrificing Accuracy for Real-Time Object Detection 1348
Real-time object detection has become a cornerstone of modern computer vision applications—from autonomous vehicles and robotics to industrial inspection and…
Instruction Pre-Training: Boost Language Model Performance from Day One with Supervised Multitask Pre-Training 4150
Traditional language model (LM) development follows a two-stage process: unsupervised pre-training on massive raw text corpora, followed by instruction tuning…
InstantStyle: Effortless, Tuning-Free Style Preservation for Text-to-Image Generation 1969
InstantStyle is a breakthrough framework that enables high-fidelity, style-consistent image generation without requiring any model retraining or per-image tuning. Built…
InternGPT: Solve Vision-Centric Tasks with Clicks, Scribbles, and ChatGPT-Level Reasoning 3221
In today’s AI landscape, large language models (LLMs) like ChatGPT have transformed how we interact with software—through natural language. But…
Marco-o1: Open-Source Reasoning Models That Reduce Hallucination and Over-Thinking in Complex Tasks 1528
As large reasoning models (LRMs) like OpenAI’s o1 demonstrate unprecedented capabilities in math, code, and planning, a critical gap remains:…
DragDiffusion: Precise, Interactive Image Editing for Real and AI-Generated Photos Using Diffusion Models 1234
DragDiffusion is an open-source framework that brings pixel-precise, point-based image manipulation to both real-world photographs and AI-generated images—without requiring users…