Awesome Speech-to-speech Generation Papers and Source Codes

Moshi: A Real-Time, Full-Duplex Speech-to-Speech Foundation Model for Natural Human-Like Dialogue 9165

Traditional spoken dialogue systems—like those used in virtual assistants or customer service bots—rely on a cascade of disconnected components: voice…

12/11/2025Full-duplex Dialogue, Speech-to-speech Generation, Spoken Language Modeling