Awesome Post-training Quantization Papers and Source Codes

OmniQuant: Near-Lossless LLM Quantization for Real-World Deployment on GPUs and Mobile Devices 857

Deploying large language models (LLMs) in real-world applications remains a major engineering challenge. While models like LLaMA-2, Falcon, and Mixtral…

01/09/2026Efficient LLM Deployment, Large Language Model Quantization, Post-training Quantization

SmoothQuant: Accurate 8-Bit LLM Inference Without Retraining – Slash Memory and Boost Speed 1576

Deploying large language models (LLMs) in production is expensive—not just in dollars, but in compute and memory. While models like…

01/05/2026Efficient LLM Deployment, Large Language Model Inference, Post-training Quantization