Awesome Speech Language Modeling Papers and Source Codes

VITA-Audio: Real-Time Speech Generation with Ultra-Low Latency for End-to-End Voice AI 636

Voice interaction is becoming a cornerstone of modern human-computer interfaces—whether through smart assistants, customer service bots, or real-time translation tools.…

01/09/2026Real-time TTS, Speech Language Modeling, Spoken Question Answering

ESPnet-SpeechLM: Build Speech Language Models Faster with Unified, Reproducible Workflows 9639

Building speech language models (SpeechLMs)—systems that jointly understand and generate both speech and text—is rapidly becoming essential for next-generation voice…

12/18/2025Multimodal Sequence Modeling, Speech Language Modeling, Voice-Driven Agent Development