Skip to content

PaperCodex

Subscribe

End-to-end Voice Assistant

Mini-Omni2: Unified Vision, Speech, and Text Interaction Without External ASR/TTS Pipelines

Mini-Omni2: Unified Vision, Speech, and Text Interaction Without External ASR/TTS Pipelines 1847

In today’s open-source AI landscape, building truly multimodal applications often means stitching together separate models for vision, speech recognition (ASR),…

12/26/2025End-to-end Voice Assistant, Multimodal Understanding, Speech-to-speech Interaction
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex