For engineers, researchers, and product teams building real-time vision systems—whether for surveillance cameras, autonomous drones, or mobile apps—achieving high detection…
UniAnimate-DiT: High-Fidelity Human Animation from a Single Image and Pose Sequence – No Full Retraining Needed 797
Animating a static human image into a realistic, temporally coherent video used to require massive datasets, complex pipelines, or retraining…
360-LLaMA-Factory: Plug-and-Play Sequence Parallelism for Long-Context SFT and DPO Without Rewriting Your Workflow 571
Training large language models (LLMs) on long sequences—whether for document-level instruction tuning, multi-modal reasoning, or complex alignment tasks—has long been…
DeepResearcher: Train AI Research Agents That Think, Verify, and Adapt in the Real Web Environment 621
In today’s AI landscape, many organizations rely on large language models (LLMs) to automate complex research tasks—such as competitive analysis,…
LLM×MapReduce: Generate Coherent Long-Form Articles from Extremely Long Inputs Using LLMs Efficiently 814
If you’ve ever tried using a large language model (LLM) to synthesize a detailed technical report from hundreds of research…
Waver: Generate Lifelike, High-Motion Videos in 1080p with One Unified Model 588
In the rapidly evolving world of generative AI, video generation has remained a particularly challenging frontier—especially when it comes to…
VGGT-Long: Scalable Monocular 3D Reconstruction for Kilometer-Scale Real-World Sequences Without Retraining or Calibration 552
Monocular 3D reconstruction has seen rapid advances thanks to foundation models capable of inferring rich geometric structure from single images.…
SimpleVLA-RL: Boost Robotic Task Performance with Minimal Data Using Reinforcement Learning 762
Building capable robotic systems that understand vision, language, and action—commonly referred to as Vision-Language-Action (VLA) models—has become a central goal…
PUSA: Generate High-Quality Video from Text or Images for $500—Not $100,000 645
Video generation has long been bottlenecked by two stubborn realities: astronomical training costs and rigid temporal modeling. Most state-of-the-art image-to-video…
Decoupled DMD: Unlock Ultra-Fast, High-Quality Image Generation with 8-Step Distillation 8234
If you’re building or evaluating text-to-image systems that demand both speed and visual fidelity, Decoupled DMD offers a breakthrough in…