Handling long input sequences—ranging from tens of thousands to over a million tokens—is no longer a theoretical benchmark but a…
Sparse Attention
VSA: Accelerate Video Diffusion Models by 2.5× with Trainable Sparse Attention—No Quality Tradeoff 2780
Video generation using diffusion transformers (DiTs) is rapidly advancing—but at a steep computational cost. Full 3D attention in these models…