Developing general-purpose robots that can navigate, interact, and manipulate in real-world urban environments remains one of the most demanding challenges…
IMAGDressing: Generate Controllable, High-Fidelity Virtual Outfits Without Retraining Models 1314
Online fashion retailers, digital content studios, and marketing teams increasingly rely on realistic human imagery to showcase garments—but traditional virtual…
MambaOut: High-Accuracy Vision Models Without the Mamba Overhead 2609
The vision community has recently seen a surge in adopting sequence modeling architectures—especially Mamba—for image tasks. Inspired by its linear…
StudioGAN: A Unified, Reproducible Benchmark for Training and Evaluating GANs at Scale 3482
Generative Adversarial Networks (GANs) have long been at the forefront of realistic image synthesis—but using them effectively in research or…
FlexiViT: One Vision Transformer for All Patch Sizes—Deploy Faster or More Accurate Models Without Retraining 3276
Vision Transformers (ViTs) have become a cornerstone of modern computer vision, offering strong performance across a wide range of tasks.…
3D-Speaker-Toolkit: Multimodal Speaker Verification and Diarization with Acoustic, Semantic, and Visual Fusion 2643
Speaker analysis—whether for verifying identity, recognizing who’s speaking, or separating voices in a multi-person conversation—is a fundamental task in speech…
TFB: The Fair, Comprehensive Benchmark for Time Series Forecasting That Solves Reproducibility and Bias Problems 1625
Time series forecasting powers critical decisions across industries—from predicting electricity demand and traffic congestion to estimating disease spread and stock…
CKnowEdit: Fix Chinese Linguistic, Factual & Logical Errors in LLMs Without Retraining 2667
Large language models (LLMs) have made remarkable progress in multilingual understanding—but their performance in Chinese remains uneven, especially when it…
FastViT: Achieve State-of-the-Art Speed and Accuracy for Vision Tasks on Mobile and Edge Devices 1974
FastViT is a high-performance hybrid vision transformer designed to deliver exceptional speed and accuracy—especially on resource-constrained platforms like mobile phones…
iTransformer: Invert Your Time Series Forecasting Architecture for Better Scalability, Generalization, and Simplicity 1824
Time series forecasting is a foundational task across finance, energy, logistics, and digital platforms—yet traditional Transformer-based models often struggle with…