Awesome Multimodal AI Papers and Source Codes

Cube: Generate 3D Assets from Text Prompts—No Modeling Skills Required 844

Imagine describing a “mechanical lobster with tank treads” in plain English and instantly getting a usable 3D model—no Blender expertise,…

AudioGPT is a multimodal AI system that bridges the gap between large language models (LLMs) like ChatGPT and the rich…

Mobile Neural Network (MNN) is an open-source, lightweight deep learning inference engine developed by Alibaba Group to bring powerful AI…