Skip to content

PaperCodex

Subscribe

Speculative Decoding

TinyLlama: A Fast, Efficient 1.1B Open Language Model for Edge Deployment and Speculative Decoding

TinyLlama: A Fast, Efficient 1.1B Open Language Model for Edge Deployment and Speculative Decoding 8770

TinyLlama is a compact yet powerful open-source language model with just 1.1 billion parameters—but trained on an impressive 3 trillion…

12/22/2025On-device Inference, Speculative Decoding, Text Generation
EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality

EAGLE-3: Accelerate LLM Inference Up to 6.5× Without Sacrificing Output Quality 2049

For teams deploying large language models (LLMs) in production—whether for chatbots, reasoning APIs, or batch processing—latency and inference cost are…

12/19/2025Efficient Language Model Serving, LLM Inference Acceleration, Speculative Decoding
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex