MODNet: Pioneering the Future of Digital Image Matting

Allan Kouidri
Modnet background removal output

In the dynamic world of digital imaging, the quest for efficient and accurate background removal has been a persistent challenge. Traditionally, achieving high-quality results has required complex setups like green screens or labor-intensive post-processing.

However, with the advent of MODNet, a revolutionary technique in the field of Deep Image Matting, the landscape of image processing is being reshaped.

Source [1]

MODNet: The Evolution of Deep Image Matting

Deep Image Matting, pioneered by Adobe Research in 2017, marked a significant leap in digital imaging. This technique, now ubiquitous across various websites for automatic background removal, relies on a dual-input system: an image and its corresponding trimap. 

A trimap divides the image into three distinct zones: the foreground, background, and a transitional area where pixels are a mix of both. Despite its effectiveness, this method demands substantial computing resources and struggles with real-time applications.

Image sample (left) and it's corresponding trimap (right) [2]

What is MODNet?

MODNet stands out as a lightweight matting objective decomposition network. This innovative system can process portrait matting from just a single input image, eliminating the need for a trimap, and operates in real-time.

  • High-Speed performance: MODNet operates at a remarkable speed of 67 frames per second on a 1080Ti GPU, demonstrating its capability to handle high-speed matting tasks efficiently.
  • Exceptional results: The network consistently achieves impressive results in processing daily photos and videos, showcasing its practicality and reliability.
  • Ease of training: Designed for simplicity, MODNet can be easily trained in an end-to-end manner, making it accessible for a wide range of users.

MODNet's simplicity, speed, and effectiveness make it an ideal solution for real-time portrait matting, offering a viable alternative to traditional green screen techniques.

Source [1]

The Architecture of MODNet

At its core, MODNet comprises three main branches:

  • Low-Resolution Branch: This branch estimates human semantics, forming the foundation for further processing.
  • High-Resolution Branch: Focused on detecting precise human boundaries, this branch leverages the initial semantic estimation for enhanced accuracy.
  • Fusion Branch: Combining the inputs from the other branches, this segment predicts the final alpha matte for background removal.
Architecture of MODNet
Architecture of MODNet [3]

Each branch plays a crucial role in ensuring the efficiency and accuracy of the network. By employing MobileNetV2 for its convolutional neural network architecture, MODNet remains lightweight yet powerful, suitable for mobile devices and high-speed processing.

Innovative techniques in MODNet

MODNet, stands out for its innovative techniques that enhance efficiency and adaptability in real-world scenarios. 

Here's a closer look at the key techniques employed in MODNet:

Efficient Atrous Spatial Pyramid Pooling (e-ASPP)

A core component of MODNet is the e-ASPP module, designed for effective multi-scale feature fusion. This ingenious adaptation of the traditional Atrous Spatial Pyramid Pooling (ASPP) significantly reduces computational demands while maintaining high performance.

By optimizing the way features at various scales are integrated, e-ASPP ensures that MODNet can process images more quickly without compromising on the quality of the matting results.

Self-Supervised Sub-Objectives Consistency (SOC)

Addressing one of the primary challenges in trimap-free matting methods – the domain shift problem – MODNet incorporates the SOC strategy. 

This self-supervised approach leverages the consistency between various sub-objectives of the matting process, enhancing the network's ability to perform reliably on real-world data. SOC is a testament to MODNet's adaptability, enabling it to deliver consistent, high-quality results even in varying and unpredictable real-world conditions.

One-Frame Delay (OFD)

To combat the issue of flickering in the alpha matte sequence – a common problem in video matting – MODNet utilizes a technique called One-Frame Delay (OFD). This method involves using information from adjacent frames (both preceding and following) to correct flickering pixels. 

The underlying assumption is that the corresponding pixels in nearby frames are likely to be accurate, allowing for a smoother and more stable portrayal of the subject across the video sequence. OFD is a clever solution that enhances the visual continuity and coherence in video matting applications, making MODNet's output more seamless and professional.

Results on a Real-World Video [2]

MODNet Real-time matting and self-supervision

A standout feature of MODNet is its ability to handle real-time portrait matting under changing scenes. This capability is augmented by its self-supervised learning strategy, which uses the consistency between sub-objectives to adapt to varied data without requiring labeled examples. 

The network can thus learn from the consistency across frames, significantly reducing artifacts in the predicted alpha matte.

MODNet Real-world adaptability and benchmarking

MODNet's real-world adaptability is further showcased in its performance on challenging datasets like the Adobe Matting Dataset and the PPM-100 benchmark. It consistently surpasses prior trimap-free methods, demonstrating its robustness and versatility in diverse scenarios.

Easily run MODNet for background removal

With the Ikomia API, you can effortlessly remove background on your image in just a few lines of code.


To get started, you need to install the API in a virtual environment [4].

pip install ikomia

Run MODNet with a few lines of code

You can also directly charge the notebook we have prepared.

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()    

# Add the MODNet algorithm to the workflow
det = wf.add_task(name="infer_modnet_portrait_matting", auto_connect=True)

# Set process parameters
    "input_size" : "800", # Select a stride of 32
    "cuda" : "False"})


# Inspect your results

  • input_size (int) - default: '800': Size of the input image (stride of 32)
  • cuda (bool): If True, CUDA-based inference (GPU). If False, run on CPU.

Explore further with the Ikomia API

Throughout this article, we've delved into the intricacies of background removal with MODNet. However, the exploration doesn't end there. The Ikomia platform broadens the horizon with an array of image matting algorithms, including the innovative Privacy-Preserving Portrait Matting (P3M).

The Ikomia HUB not only showcases these algorithms but also simplifies their testing and comparison. It provides accessible code snippets, allowing users to effortlessly experiment with different techniques and assess their capabilities.

For those seeking a comprehensive understanding of how to leverage these tools, the Ikomia documentation serves as a valuable resource. It offers detailed guidance on utilizing the API to its fullest potential.

Beyond the API, the Ikomia ecosystem encompasses the  Ikomia STUDIO. This platform presents a user-friendly interface that retains the full spectrum of functionalities offered by the API. It's an ideal solution for users who prefer a more intuitive and visually guided approach to image processing.




[3] MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition

[4] How to create a virtual environment in Python

No items found.

Build with Python API


Create with STUDIO app


Deploy with SCALE