Skip to content

PaperCodex

Subscribe

Speech-to-speech Generation

Moshi: A Real-Time, Full-Duplex Speech-to-Speech Foundation Model for Natural Human-Like Dialogue

Moshi: A Real-Time, Full-Duplex Speech-to-Speech Foundation Model for Natural Human-Like Dialogue 9165

Traditional spoken dialogue systems—like those used in virtual assistants or customer service bots—rely on a cascade of disconnected components: voice…

12/11/2025Full-duplex Dialogue, Speech-to-speech Generation, Spoken Language Modeling
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex