Skip to content

PaperCodex

Subscribe

Large Language Model Deployment

S-LoRA: Serve Thousands of Task-Specific LLMs Efficiently on a Single GPU

S-LoRA: Serve Thousands of Task-Specific LLMs Efficiently on a Single GPU 1879

Deploying dozens—or even thousands—of fine-tuned large language models (LLMs) has traditionally been a costly and complex endeavor. Each adapter typically…

01/04/2026Large Language Model Deployment, Multi-adapter Serving, Parameter-Efficient Fine-Tuning
MNN: Run Large Language Models and Vision AI Offline on Mobile with a Lightweight, High-Performance Inference Engine

MNN: Run Large Language Models and Vision AI Offline on Mobile with a Lightweight, High-Performance Inference Engine 13694

Mobile Neural Network (MNN) is an open-source, lightweight deep learning inference engine developed by Alibaba Group to bring powerful AI…

12/18/2025Large Language Model Deployment, Multimodal AI, On-device Inference
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex