InvSR: High-Quality Image Super-Resolution in 1–5 Steps Using Diffusion Inversion

Paper & Code

Arbitrary-steps Image Super-resolution via Diffusion Inversion

2025 • zsyOAOA/InvSR

★1341

Image super-resolution (SR) remains a critical capability across computer vision applications—from upscaling smartphone photos to enhancing AI-generated content (AIGC). However, many modern SR techniques powered by diffusion models trade off quality for speed, or require dozens of sampling steps to produce usable results. InvSR changes this by delivering state-of-the-art image quality in just 1 to 5 sampling steps, thanks to a novel diffusion inversion framework grounded in the CVPR 2025 paper “Arbitrary-steps Image Super-resolution via Diffusion Inversion.”

Designed for practitioners who need fast, flexible, and high-fidelity upscaling, InvSR leverages rich priors from large pre-trained diffusion models while introducing a lightweight deep noise predictor and a Partial Noise Prediction strategy. This combination enables rapid initialization along the diffusion trajectory, bypassing the need for slow iterative sampling—without sacrificing visual fidelity.

Whether you’re working on real-world image enhancement, AIGC post-processing, or building scalable SR pipelines, InvSR offers a practical balance of speed, quality, and deployment simplicity.

Why InvSR Stands Out: Speed, Flexibility, and Performance

Arbitrary-Step Sampling with Consistent Quality

Unlike conventional diffusion-based SR methods that lock you into a fixed number of steps (often 20+), InvSR supports any number of sampling steps between 1 and 5. Remarkably, even with just one step, it achieves performance that matches or exceeds recent state-of-the-art approaches. This flexibility is invaluable for latency-sensitive applications—such as mobile apps, real-time video enhancement, or batch processing in production environments.

Partial Noise Prediction: Smarter Initialization

At the core of InvSR is its Partial Noise Prediction mechanism. Instead of starting from pure noise (as in standard diffusion sampling), InvSR uses a trained noise predictor to estimate an optimal intermediate state in the forward diffusion process. This state becomes the starting point for reverse sampling, dramatically reducing the number of steps needed to reconstruct a high-resolution image with fine details and natural textures.

Built on Strong Priors, Trained for Efficiency

InvSR taps into the visual knowledge embedded in pre-trained models like SD-Turbo, but augments them with a compact, task-specific noise predictor. This design avoids costly retraining of the entire diffusion backbone while achieving superior real-world SR performance.

Real-World Use Cases: Where InvSR Adds Immediate Value

Enhancing Real-World Low-Resolution Images

Many off-the-shelf SR tools fail on real-world photos due to complex degradations (e.g., motion blur, JPEG artifacts, sensor noise). InvSR excels here—its noise predictor is trained to handle such imperfections, making it ideal for upscaling images from smartphones, surveillance cameras, or legacy photo archives.

AIGC Image Refinement

AI-generated images often lack crisp details or exhibit minor artifacts. InvSR can serve as a lightweight post-enhancement module in generative pipelines, instantly boosting resolution and perceptual quality—without re-running the generative model or requiring manual correction.

Rapid Prototyping and Production Deployment

Thanks to its ultra-fast inference (1–5 steps), InvSR integrates easily into existing workflows. Whether you’re a researcher validating SR ideas or an engineer building a web demo, InvSR’s minimal latency makes experimentation and deployment frictionless.

Problems InvSR Solves for Practitioners

Traditional diffusion-based SR methods suffer from three key pain points:

Slow inference: Requiring 20–100 steps makes them impractical for real-time use.
Inflexible pipelines: Most methods don’t support variable step counts, limiting tuning options.
Poor real-world robustness: Synthetic training data often leads to unrealistic textures or color distortions on actual photos.

InvSR directly addresses all three:

Speed via minimal-step sampling,
Flexibility through arbitrary-step support (1–5),
Robustness by training the noise predictor on diverse, real-world-like degradation patterns.

Getting Started with InvSR: Simple and Practical Usage

InvSR is designed for quick adoption. You don’t need deep expertise in diffusion models to run it.

Quick Command-Line Inference

Run super-resolution on a single image or folder with one command:

python inference_invsr.py -i [image_path] -o [output_folder] --num_steps 1

For large images (e.g., 1K → 4K), use tiling to avoid memory issues:

--chopping_size 256 --chopping_bs 1

Try It Locally or in Docker

Launch a Gradio web demo: python app.py
Or run via Docker: docker compose up -d (accessible at http://127.0.0.1:7860)

Environment Setup

InvSR requires Python 3.10 and PyTorch 2.4.0. A ready-to-use conda environment can be created in minutes using the provided environment.yaml or installation script.

Reproducing Paper Results

The repo includes instructions and links to evaluation datasets (e.g., ImageNet-Test, RealSRV3) for researchers aiming to benchmark against reported metrics. Color-fixing options like --color_fix wavelet ensure faithful reproduction of visual quality.

Limitations and Practical Considerations

While InvSR offers exceptional speed and quality within its design scope, users should note the following:

Step range is limited to 1–5: It’s optimized for fast inference, not ultra-high-step refinement.
Depends on pre-trained models: You’ll need SD-Turbo and the provided noise predictor checkpoint for best results.
Specific environment requirements: Python 3.10 and PyTorch 2.4.0 are mandatory; older setups may require updates.
Training requires preparation: Reproducing the noise predictor involves downloading a finetuned LPIPS model and configuring data paths, though inference works out-of-the-box with pre-trained weights.

These constraints are typical for cutting-edge diffusion-based methods, but InvSR’s modularity and clear documentation minimize setup friction.

Summary

InvSR redefines what’s possible in fast, high-quality image super-resolution. By combining diffusion inversion with a smart Partial Noise Prediction strategy, it delivers state-of-the-art results in as few as one sampling step—making it uniquely suited for real-world deployment, AIGC enhancement, and rapid prototyping. With support for batch processing, Docker deployment, and memory-efficient tiling, InvSR is not just a research novelty but a practical tool for developers, designers, and researchers alike. If you need fast, flexible, and visually compelling super-resolution without the computational overhead, InvSR is worth integrating into your next project.