Awesome World Modeling Papers and Source Codes

Emu3.5: A Native Multimodal World Model for Unified Vision-Language Generation and Reasoning 1372

Imagine a single AI model that doesn’t just “see” or “read”—but seamlessly blends images and text in both input and…