Skip to content

PaperCodex

Subscribe

Image Generation

SpargeAttention: Universal, Training-Free Sparse Attention for Faster LLM, Image & Video Inference Without Retraining

SpargeAttention: Universal, Training-Free Sparse Attention for Faster LLM, Image & Video Inference Without Retraining 814

Large AI models—from language generators to video diffusion systems—are bottlenecked by the attention mechanism, whose computational cost scales quadratically with…

01/13/2026Image Generation, Language Modeling, Video Generation
HART: Generate 1024×1024 Images Faster and More Efficiently Than Diffusion Models

HART: Generate 1024×1024 Images Faster and More Efficiently Than Diffusion Models 635

For teams building AI-powered visual applications—whether in creative tools, digital content platforms, or rapid prototyping—the trade-off between image quality, speed,…

01/13/2026Autoregressive Modeling, High-resolution Synthesis, Image Generation
LightningDiT: Break the Reconstruction-Generation Trade-Off with 21.8x Faster, SOTA Image Diffusion

LightningDiT: Break the Reconstruction-Generation Trade-Off with 21.8x Faster, SOTA Image Diffusion 1315

Latent diffusion models (LDMs) have become a cornerstone of modern high-fidelity image generation. However, a persistent challenge has limited their…

12/27/2025Diffusion Transformers, Image Generation, Latent Diffusion Models
GANformer: Compositional, Controllable Image Generation with Fewer Training Steps

GANformer: Compositional, Controllable Image Generation with Fewer Training Steps 1342

Traditional generative adversarial networks (GANs) often act as “black boxes”—they produce compelling images but offer little insight into how those…

12/27/2025Compositional Scene Modeling, Image Generation, Layout-to-image Synthesis
OOTDiffusion: High-Fidelity, Controllable Virtual Try-On Without Garment Warping

OOTDiffusion: High-Fidelity, Controllable Virtual Try-On Without Garment Warping 6482

OOTDiffusion represents a significant leap forward in image-based virtual try-on (VTON) technology. Built on the foundation of pretrained latent diffusion…

12/26/2025Diffusion Models, Image Generation, Virtual Try-on
Show-o: One Unified Transformer for Multimodal Understanding and Generation Across Text, Images, and Videos

Show-o: One Unified Transformer for Multimodal Understanding and Generation Across Text, Images, and Videos 1809

In today’s AI landscape, developers and researchers often juggle separate models for vision, language, and video—each with its own architecture,…

12/18/2025Image Generation, Multimodal Understanding, Video Understanding
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex