Awesome Audio Understanding Papers and Source Codes

Kimi-Audio: A Unified, Open-Source Foundation Model for Speech, Sound, and Spoken Dialogue 4373

Building voice-enabled applications today often means stitching together separate models for speech recognition, sound classification, audio captioning, and spoken response…

12/27/2025Audio Understanding, Speech Recognition, Spoken Dialogue Generation

Step-Audio 2: Open-Source Multimodal LLM for Paralinguistic-Aware, Tool-Enhanced Speech Understanding and Conversation 1252

Step-Audio 2 is an open-source, end-to-end multimodal large language model (MLM) purpose-built for real-world audio understanding and natural speech conversation.…

12/27/2025Audio Understanding, Paralinguistic Reasoning, Speech-to-speech Conversation