Neural volumetric video—capturing and rendering dynamic 3D scenes that can be viewed from any angle and time—is no longer just…
AnyText: Generate and Edit Multilingual Text in AI Images with Pixel-Perfect Accuracy 4822
If you’ve ever tried using a standard AI image generator to create a poster, product mockup, or social media banner…
DreamCraft3D: Generate Photorealistic, View-Consistent 3D Assets from a Single Image 2989
Creating high-quality 3D assets has traditionally required expert modeling skills, extensive manual labor, or expensive capture setups—barriers that limit accessibility…
S1: Boost Reasoning Performance with Just 1,000 Examples and Smart Test-Time Scaling 6613
In the rapidly evolving landscape of large language models (LLMs), achieving strong reasoning capabilities often comes at the cost of…
SwinIR: State-of-the-Art Image Restoration with Fewer Parameters and Higher Quality 5230
Image quality degradation—whether from compression, noise, or low resolution—is a persistent challenge across industries ranging from medical imaging to consumer…
PP-PicoDet: Real-Time Object Detection with SOTA Accuracy on Mobile and Edge Devices 13974
In today’s era of intelligent edge computing, deploying high-performance computer vision models on resource-constrained devices like smartphones, embedded sensors, and…
ESPnet-SpeechLM: Build Speech Language Models Faster with Unified, Reproducible Workflows 9639
Building speech language models (SpeechLMs)—systems that jointly understand and generate both speech and text—is rapidly becoming essential for next-generation voice…
FinGPT: Open-Source Financial LLMs with Transparent, Global Data Pipelines for Real-World Finance Applications 1284
Large language models (LLMs) are transforming how we interact with data—but in finance, high-quality, domain-specific language models have largely remained…
DeepSeek-VL2: High-Performance Vision-Language Understanding with Efficient Mixture-of-Experts Architecture 5072
DeepSeek-VL2 is an open-source, advanced vision-language model (VLM) built on a Mixture-of-Experts (MoE) architecture, engineered for robust multimodal understanding across…
OmniSafe: Accelerate Safe Reinforcement Learning Research with a Unified, Modular Framework 1031
Reinforcement learning (RL) holds transformative potential for real-world applications—from autonomous vehicles and surgical robots to industrial control systems. Yet, one…