Skip to content

PaperCodex

Subscribe

Sparse Attention

MoBA: Efficient Long-Context Attention for LLMs Without Compromising Reasoning Quality

MoBA: Efficient Long-Context Attention for LLMs Without Compromising Reasoning Quality 2014

Handling long input sequences—ranging from tens of thousands to over a million tokens—is no longer a theoretical benchmark but a…

12/22/2025Efficient Attention, Long-context Language Modeling, Sparse Attention
VSA: Accelerate Video Diffusion Models by 2.5× with Trainable Sparse Attention—No Quality Tradeoff

VSA: Accelerate Video Diffusion Models by 2.5× with Trainable Sparse Attention—No Quality Tradeoff 2780

Video generation using diffusion transformers (DiTs) is rapidly advancing—but at a steep computational cost. Full 3D attention in these models…

12/19/2025Diffusion Models, Sparse Attention, Video Generation
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex