Skip to content

PaperCodex

Subscribe

Text-to-Image Generation

FlowTok: Unified Text-to-Image and Image-to-Text Generation with Compact 1D Tokens

FlowTok: Unified Text-to-Image and Image-to-Text Generation with Compact 1D Tokens 1082

FlowTok reimagines cross-modal generation by collapsing the traditionally complex boundary between text and images into a streamlined, efficient process. Unlike…

12/19/2025Image-to-text Generation, Multimodal Representation Learning, Text-to-Image Generation
Lumina-mGPT 2.0: A Standalone Autoregressive Image Generator That Unifies Multimodal Tasks Without Diffusion Dependencies

Lumina-mGPT 2.0: A Standalone Autoregressive Image Generator That Unifies Multimodal Tasks Without Diffusion Dependencies 1076

In the ever-evolving landscape of generative AI, image synthesis has long been dominated by diffusion models—powerful, yet often complex, resource-intensive,…

12/19/2025Controllable Image Synthesis, Image Editing, Text-to-Image Generation
AnyText: Generate and Edit Multilingual Text in AI Images with Pixel-Perfect Accuracy

AnyText: Generate and Edit Multilingual Text in AI Images with Pixel-Perfect Accuracy 4822

If you’ve ever tried using a standard AI image generator to create a poster, product mockup, or social media banner…

12/18/2025Multilingual Image Synthesis, Text-to-Image Generation, Visual Text Editing
OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation

OmniGen2: Unified Open-Source Multimodal Generation for Text-to-Image, Editing, and In-Context Creation 3962

OmniGen2 is an open-source, unified generative model that seamlessly bridges text and vision in a single architecture. Unlike many multimodal…

12/17/2025In-context Generation, Instruction-guided Image Editing, Text-to-Image Generation
StoryDiffusion: Generate Consistent Long-Form Visual Stories from Text Without Retraining Models

StoryDiffusion: Generate Consistent Long-Form Visual Stories from Text Without Retraining Models 6351

Creating visually coherent sequences of images or videos from text prompts has long been a bottleneck in AI-powered storytelling. While…

12/17/2025Text-to-Image Generation, Video Generation, Visual Storytelling
MMaDA: One Unified Model for Text Reasoning, Multimodal Understanding, and Image Generation

MMaDA: One Unified Model for Text Reasoning, Multimodal Understanding, and Image Generation 1518

Imagine running a single model that can answer complex reasoning questions, understand images and text together, and generate high-quality images…

12/17/2025Diffusion Language Models, Multimodal Reasoning, Text-to-Image Generation
InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required

InstantCharacter: Generate Consistent, High-Fidelity Character Images from a Single Photo—No Fine-Tuning Required 1044

Creating personalized, visually consistent characters is a common need across gaming, animation, virtual avatars, and digital storytelling—but until recently, doing…

12/11/202512/15/2025Character Personalization, Diffusion Transformer Adaptation, Text-to-Image Generation

Posts pagination

Previous 1 2
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex