Long-horizon reasoning is one of the toughest challenges in current AI agent development. Traditional agentic systems, which rely on steadily…
CARAFE: Boost Dense Prediction Accuracy with Content-Aware, Lightweight Feature Upsampling 32164
Feature upsampling is a critical but often overlooked component in modern computer vision pipelines. Whether you’re building an object detector,…
MNN: Run Large Language Models and Vision AI Offline on Mobile with a Lightweight, High-Performance Inference Engine 13694
Mobile Neural Network (MNN) is an open-source, lightweight deep learning inference engine developed by Alibaba Group to bring powerful AI…
The Well: 15TB of Diverse Physics Simulations for Training and Benchmarking Surrogate Models in Scientific Machine Learning 1582
If you’re working on machine learning models that aim to emulate or accelerate physics-based simulations—whether in fluid dynamics, astrophysics, or…
FastVLM: High-Resolution Vision-Language Inference with 85× Faster Time-to-First-Token and Minimal Compute Overhead 7052
Vision Language Models (VLMs) are increasingly central to real-world applications—from mobile assistants that read documents to AI systems that interpret…
Step-Audio: Unified Speech Understanding and Generation for Real-World Voice Applications 4571
Building intelligent voice interfaces used to mean stitching together separate speech recognition (ASR), text generation, and text-to-speech (TTS) systems—each with…
DreamTalk: Generate Emotionally Expressive Talking Heads from Audio Using Diffusion Models 1767
Creating lifelike digital avatars that speak naturally with accurate lip movements and rich emotional expression has long been a challenge…
SkyThought: Boost Code Generation Accuracy Without Retraining—Even Small Models Beat GPT-4o-mini 3358
SkyThought is an open-source framework built around S*—a breakthrough test-time scaling approach designed specifically to elevate code generation performance in…
AniTalker: Generate Lifelike, Expressive Talking Faces from a Single Image and Audio Clip 1588
Imagine turning a static portrait—like the Mona Lisa or a headshot from your LinkedIn profile—into a vivid, talking avatar that…
Colossal-Auto: Automate Large Model Training with Zero Expertise in Parallelization or Checkpointing 41290
Training large-scale AI models—whether language models like LLaMA or video generators like Open-Sora—has become increasingly common, yet remains bottlenecked by…