SwinIR: The Ultimate Guide to Image Restoration and Super-Resolution

Allan Kouidri
SwinIR super resolution SwinIR cityscape

In our previous exploration of Unveiling the Power of SwinIR: A Deep Dive into Image Restoration Using Swin Transformer, we delved deep into the architectural intricacies and theoretical foundations of this powerful image restoration model. Building upon that knowledge, this article aims to guide you through the practical applications of SwinIR, demonstrating its prowess in super-resolution tasks and how it can transform your image processing workflow.

Deep Dive into SwinIR’s Architecture and Advantages →

SwinIR is an open-source model that ranks among the best for various super-resolution tasks, showcasing remarkable effectiveness, and adaptability across diverse real-world degradation scenarios.

While using SwinIR directly from the source code, here we will see how to streamline the process with the Ikomia API for those keen on avoiding the intricacies of managing dependencies and versions.

It not only simplifies the development of computer vision workflows but also facilitates effortless experimentation with various parameters to achieve optimal results.

This can be particularly advantageous for developers and researchers working on image super-resolution tasks, by allowing them to concentrate on experimentation and solution development instead of technical setup.

SwinIR: Image restoration with transformers

In the broad world of deep learning, transformers have changed how we tackle tasks from natural language processing to computer vision. The release of the Swin Transformer marked a significant step forward in the field of image processing. 

In this context, the emergence of SwinIR (august 2021) —a model leveraging the Swin Transformer for image restoration—marks a significant milestone. In this article, we dive into SwinIR, exploring its architecture, capabilities, and the revolutionary impact it brought to the domain of super resolution.

The Swin transformer

Before diving into SwinIR, it's important to understand its backbone—the Swin Transformer. Originally designed for vision tasks, the Swin Transformer dissects an image into non-overlapping patches, processing them in a hierarchical manner.

This architecture enables it to grasp both local details and broader contextual information, a combination crucial for image-related tasks.

SwinIR architectural blueprint

SwinIR architecture - Original image source (1)

SwinIR's unique hybrid structure, which is compartmentalized into three pivotal modules:

  1. Shallow feature extraction (utilizing Convolutional Neural Networks or CNN)
  2. Deep feature extraction (leveraging the Swin Transformer)
  3. High-quality image reconstruction (through CNN)

Shallow feature extraction

This phase essentially acts as a preparatory step, transitioning the Low-Resolution (LR) image, represented as ILQ ∈ R H×W×Cin, to an enhanced dimensional feature space characterized by C channels. 

The transformation is facilitated by a convolutional layer, denoted as HSF, with a kernel size of 3×3:

            F0 = HSF (ILQ

Adding an early small convolutional layer at the beginning of the Vision Transformer was reported to help the training to stabilize and converge faster.

Deep feature extraction

Following the shallow HSF layer, the deep feature extraction phase unfolds. 

This segment comprises K distinct Residual Swin Transformer Blocks (RSTB) coupled with a CNN.

Initially, the RSTB blocks sequentially compute transitional features F1, F2, . . . , FK:

            Fi = HRSTBi (Fi−1), i = 1, 2, . . . , K

Here, HRSTBi signifies the i-th RSTB. At the end a CNN, HCONV, extracts the deep feature FDF

             FDF = HCONV (FK

By placing a convolutional layer at the close of the feature extraction process, can bring the inductive bias of the convolution operation into the Transformer-based network, thereby establishing a stronger base for the eventual combination of both shallow and deep features.

High-quality image reconstruction

Finally the process, the reconstruction module, HREC, produces the high-resolution output using both shallow and deep features:

            IHR = HREC(F0 + FDF

The shallow features predominantly encapsulate low-frequency details, while the deep features encapsulate high-frequency nuances. The intricacy of reconstructing the latter necessitates a prolonged skip connection spanning from F0 to FDF , enabling the deep feature extraction to focus on recovering high-frequency details.

Tokenization and self-attention in SwinIR

SwinIR's prowess can be attributed to the core principles of the Swin Transformer. The model starts by dividing an image into patches, treating each patch as an individual token. 

These tokens are then fed into the transformer layers, where the magic of self-attention comes into play. Each token is evaluated in relation to others, allowing the model to determine the significance of each patch based on the broader image context.

Hierarchical processing and multi-scale approach

Furthermore, the hierarchical structure of the Swin Transformer ensures that the model processes these patches at different resolutions. 

This multi-scale approach ensures that SwinIR captures details at various granularities, making it effective for a wide range of restoration tasks.

SwinIR: pioneering new frontiers in image restoration

SwinIR represents a good example of the adaptability of transformers in the world of image processing. With its architecture rooted in the Swin Transformer, SwinIR can tackle plenty of image restoration challenges, ranging from super-resolution and denoising.

Key features

SwinIR's design makes it a Swiss Army knife in the image restoration domain. Whether you're upscaling a low-res image, cleaning up a noisy photograph, or removing rain streaks from a snapshot, SwinIR has you covered.

State-of-the-art performance

Benchmarks don't lie. SwinIR has outperformed many of its peers, establishing itself as a frontrunner in various image restoration tasks.

Long-range dependency modeling

Transformers excel at recognizing and modeling long-range dependencies in data. For image restoration, where a distant part of an image might hold the key to restoring another section, this capability is invaluable.

Attention-driven processing

Traditional convolutional neural networks (CNNs) are now taking a backseat, as SwinIR's full attention mechanism processes image patches with varying weights based on context.


For those looking to adapt SwinIR to specific challenges, the model's architecture allows for fine-tuning, ensuring optimal performance for specialized tasks.

SwinIR in action: real-world applications

The practical implications of SwinIR are vast and varied:

Media & entertainment

In film restoration, SwinIR can rejuvenate old classics, enhancing their resolution and cleaning up artifacts. Similarly, photographers can salvage noisy shots, ensuring that every click is picture-perfect.


In the world of digital forensics, image clarity can be the difference between solving a case and hitting a dead end. SwinIR's denoising and super-resolution capabilities can aid forensic experts in analyzing crucial evidence.


Outdoor surveillance cameras often capture rain-affected footage. SwinIR's de-raining feature ensures clear footage, irrespective of the weather.

SwinIR and beyond: the future of image restoration

SwinIR, with its foundation in the Swin Transformer, has heralded a new era in image restoration. Its versatility, performance, and adaptability make it a game-changer in the field.

Get started with Ikomia API

Using the Ikomia API, you can effortlessly restore your favorite images in just a few lines of code.

To get started, all you need is to install the API in a virtual environment.

How to install a virtual environment

pip install ikomia

API documentation

API repo

Run SwinIR, an image restoration model with a few lines of code

You can also charge directly the notebook we have prepared.

For a step-by-step guide with detailed information on the algorithm's parameters, refer to this section.

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()    

# Add the SwinIR algorithm
swinir = wf.add_task(name="infer_swinir_super_resolution", auto_connect=True)

# Run on your image  

# Inspect your results

SwinIR example

Step by step SwinIR image restoration with the Ikomia API

In this section, we will demonstrate how to utilize the Ikomia API to create a workflow for image restoration with SwinIR as presented above.

Step 1: import

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

  • The ‘Workflow’ class is the base object for creating a workflow. It provides methods for setting inputs (image, video, directory), configuring task parameters, obtaining time metrics, and retrieving specific task outputs, such as graphics, segmentation masks, and texts.
  • The ‘display’ function offers a flexible and customizable way to display images (input/output) and graphics, such as bounding boxes and segmentation masks.

Step 2: create workflow

wf = Workflow()

We initialize a workflow instance. The “wf” object can then be used to add tasks to the workflow instance, configure their parameters, and run them on input data.

Step 3: add and connect SwinIR

swinir = wf.add_task(name="infer_swinir", auto_connect=True)

Step 4: set the parameters

    "use_gan": "True",
    "large_model": "False",
    "cuda": "True",
    "tile": "256",
    "overlap_ratio": "0.1",
    "scale": "4"

  • use_gan (bool) - Default ‘True’: If True, algorithm will use GAN method to upscale image, else will use PSNR method.
  • large_model (bool) - Default ‘False’: If True, algorithm will use the large model, else will use medium model.
  • cuda (bool) - Default ‘True’: Run with cuda or cpu.
  • tile (int) - Default ‘256’: Size of tile. Instead of passing the whole image to the deep learning model, which consumes a lot of memory, the model is fed with square tiles of fixed size one by one.
  • overlap_ratio (float) - Default ‘0.1’: Overlap between tiles in percentage. Overlapping tiles then blending the results leads to a smoother image. Set it to 0 to have no overlap like in the original repo. 1,0 is max overlap.
  • scale (int) - Default ‘4’: Scale factor. Must be 2 or 4. Scale 2 is not available for large models.

Step 5: execute your workflow to enhance your image.

You can apply the workflow to your image using the ‘run_on()’ function. In this example, we use the image url:


Step 6: display your results

Finally, you can display our image results using the display function:


SwinIR results

Build your own workflow with Ikomia

In this tutorial, we have explored the process of creating a workflow for image restoration with SwinIR. 

The Ikomia API streamlines the development of Computer Vision workflows, facilitating easy experimentation with different parameters to attain the best outcomes.

For information on the API, check out documentation. Additionally, browse the list of cutting-edge algorithms available on Ikomia HUB and explore Ikomia STUDIO, which provides a user-friendly interface with the same functionalities as the API.

(1) https://github.com/JingyunLiang/SwinIR

No items found.

Build with Python API


Create with STUDIO app