Awesome Long-context Language Modeling Papers and Source Codes

LServe: Accelerate Long-Context LLM Inference with Unified Sparse Attention—No Accuracy Trade-Off 790

Deploying large language models (LLMs) to handle long documents, extensive chat histories, or detailed technical manuals remains a major bottleneck…

Handling long input sequences—ranging from tens of thousands to over a million tokens—is no longer a theoretical benchmark but a…