Awesome Multimodal Sequence Modeling Papers and Source Codes

LeVo: Generate Full-Length, High-Fidelity Songs with Perfect Vocal-Instrument Harmony—Even on Consumer GPUs 1005

LeVo is a breakthrough in open-source AI music generation. Unlike many existing tools that produce fragmented, low-quality, or inconsistent audio,…

01/04/2026AI Music Generation, Multimodal Sequence Modeling, Text-to-music Synthesis

ESPnet-SpeechLM: Build Speech Language Models Faster with Unified, Reproducible Workflows 9639

Building speech language models (SpeechLMs)—systems that jointly understand and generate both speech and text—is rapidly becoming essential for next-generation voice…

12/18/2025Multimodal Sequence Modeling, Speech Language Modeling, Voice-Driven Agent Development