Emu3.5: A Native Multimodal World Model for Unified Vision-Language Generation and Reasoning 1372 Imagine a single AI model that doesn’t just “see” or “read”—but seamlessly blends images and text in both input and… 01/04/2026Multimodal Generation, vision-language modeling, World Modeling