Awesome Multi-tenant LLM Serving Papers and Source Codes

Punica: Serve Dozens of LoRA Models on One GPU—12x Faster, 2ms Overhead 1128

Deploying multiple fine-tuned large language models (LLMs) used to mean multiplying your GPU costs—until Punica arrived. If you’re managing dozens…