Awesome Text-to-Image Generation Papers and Source Codes | Page 2 of 3

Qwen-Image: Generate and Edit Images with Perfect Text—Even in Chinese 6339

If you’ve ever struggled to generate marketing visuals with legible multilingual text—or tried to edit a product image only to…

12/26/2025Image Editing, Multimodal Text Rendering, Text-to-Image Generation

HunyuanImage-3.0: The Largest Open-Source Multimodal Image Generator with Native Reasoning and MoE Architecture 2562

HunyuanImage-3.0 is a groundbreaking open-source image generation model developed by Tencent. Unlike traditional diffusion-based approaches, it builds a native multimodal…

12/26/2025Mixture-of-Experts (MoE), Multimodal Reasoning, Text-to-Image Generation

Versatile Diffusion: One Unified Model for Text-to-Image, Image-to-Text, and Creative Variations 1334

In today’s fast-evolving AI landscape, most generative systems are built for a single task—whether that’s turning text into images, editing…

12/26/2025Image-to-text Captioning, Multimodal Diffusion, Text-to-Image Generation

InstantStyle: Effortless, Tuning-Free Style Preservation for Text-to-Image Generation 1969

InstantStyle is a breakthrough framework that enables high-fidelity, style-consistent image generation without requiring any model retraining or per-image tuning. Built…

12/19/2025Image Stylization, Style Transfer, Text-to-Image Generation

OmniGen: One Unified Model for All Image Generation Tasks—No Plugins, No Preprocessing, Just Prompts 4282

Modern image generation is powerful—but fragmented. Depending on your goal—generating from text, editing existing images, preserving a person’s identity, or…

12/19/2025Image Editing, Subject-driven Generation, Text-to-Image Generation

Flow-GRPO: Boost Text-to-Image Accuracy with Online RL—Without Sacrificing Quality or Diversity 1720

If you’ve ever struggled with diffusion models failing to follow detailed prompts—like “a golden retriever sitting to the left of…

12/19/2025Controllable Diffusion Models, Reinforcement Learning For Generative Models, Text-to-Image Generation

FlowTok: Unified Text-to-Image and Image-to-Text Generation with Compact 1D Tokens 1082

FlowTok reimagines cross-modal generation by collapsing the traditionally complex boundary between text and images into a streamlined, efficient process. Unlike…

12/19/2025Image-to-text Generation, Multimodal Representation Learning, Text-to-Image Generation

Lumina-mGPT 2.0: A Standalone Autoregressive Image Generator That Unifies Multimodal Tasks Without Diffusion Dependencies 1076

In the ever-evolving landscape of generative AI, image synthesis has long been dominated by diffusion models—powerful, yet often complex, resource-intensive,…

12/19/2025Controllable Image Synthesis, Image Editing, Text-to-Image Generation