Awesome Large Language Model Quantization Papers and Source Codes

OmniQuant: Near-Lossless LLM Quantization for Real-World Deployment on GPUs and Mobile Devices 857

Deploying large language models (LLMs) in real-world applications remains a major engineering challenge. While models like LLaMA-2, Falcon, and Mixtral…

01/09/2026Efficient LLM Deployment, Large Language Model Quantization, Post-training Quantization

TEQ: Accurate 3- and 4-Bit LLM Quantization Without Inference Overhead 2544

Deploying large language models (LLMs) in production often runs into a hard trade-off: reduce model size and latency through quantization,…

12/22/2025Efficient LLM Inference, Large Language Model Quantization, Weight-only Quantization