Skip to content

PaperCodex

Subscribe

Speech Generation

SpeechAlign: Bridging the Gap Between Realistic and Human-Preferred Speech Generation

SpeechAlign: Bridging the Gap Between Realistic and Human-Preferred Speech Generation 1396

Recent advances in speech language models (SLMs) have made it possible to generate highly realistic speech—often indistinguishable from human voices…

12/26/2025Neural Codec Modeling, Preference Alignment, Speech Generation
Vocos: High-Quality, Real-Time Neural Vocoder Using Fourier Spectra for Efficient Audio Synthesis

Vocos: High-Quality, Real-Time Neural Vocoder Using Fourier Spectra for Efficient Audio Synthesis 1028

If you’re building or evaluating text-to-speech (TTS), voice cloning, or generative audio systems, the choice of neural vocoder can make…

12/26/2025Audio Synthesis, Neural Vocoding, Speech Generation
Step-Audio: Unified Speech Understanding and Generation for Real-World Voice Applications

Step-Audio: Unified Speech Understanding and Generation for Real-World Voice Applications 4571

Building intelligent voice interfaces used to mean stitching together separate speech recognition (ASR), text generation, and text-to-speech (TTS) systems—each with…

12/18/2025Multimodal Language Modeling, Speech Generation, Speech Understanding
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex