Skip to content

PaperCodex

Subscribe

Efficient LLM Deployment

SmoothQuant: Accurate 8-Bit LLM Inference Without Retraining – Slash Memory and Boost Speed

SmoothQuant: Accurate 8-Bit LLM Inference Without Retraining – Slash Memory and Boost Speed 1576

Deploying large language models (LLMs) in production is expensive—not just in dollars, but in compute and memory. While models like…

01/05/2026Efficient LLM Deployment, Large Language Model Inference, Post-training Quantization
BitNet: Run 1.58-Bit LLMs Locally on CPUs with 6x Speedup and 82% Less Energy

BitNet: Run 1.58-Bit LLMs Locally on CPUs with 6x Speedup and 82% Less Energy 24452

Running large language models (LLMs) used to require powerful GPUs, expensive cloud infrastructure, or specialized hardware—until BitNet changed the game.…

12/12/2025Efficient LLM Deployment, On-device Inference, Text Generation
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex