FlowTok reimagines cross-modal generation by collapsing the traditionally complex boundary between text and images into a streamlined, efficient process. Unlike…
Text-to-Image Generation
Lumina-mGPT 2.0: A Standalone Autoregressive Image Generator That Unifies Multimodal Tasks Without Diffusion Dependencies 1076
In the ever-evolving landscape of generative AI, image synthesis has long been dominated by diffusion models—powerful, yet often complex, resource-intensive,…
AnyText: Generate and Edit Multilingual Text in AI Images with Pixel-Perfect Accuracy 4822
If you’ve ever tried using a standard AI image generator to create a poster, product mockup, or social media banner…
OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation 3962
OmniGen2 is an open-source, unified generative model that seamlessly bridges text and vision in a single architecture. Unlike many multimodal…
StoryDiffusion: Generate Consistent Long-Form Visual Stories from Text Without Retraining Models 6351
Creating visually coherent sequences of images or videos from text prompts has long been a bottleneck in AI-powered storytelling. While…
MMaDA: One Unified Model for Text Reasoning, Multimodal Understanding, and Image Generation 1518
Imagine running a single model that can answer complex reasoning questions, understand images and text together, and generate high-quality images…
InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required 1044
Creating personalized, visually consistent characters is a common need across gaming, animation, virtual avatars, and digital storytelling—but until recently, doing…