Advanced usage of Detectron2: best practices and troubleshooting

Allan Kouidri
Detectron2 panoptic segmentation on street

This blog post will provide a comprehensive overview of Detectron2, highlighting its key features and advantages. We'll guide you through the installation and the practical usage of Detectron2.  We'll address common challenges such as installation issues, compatibility concerns, and algorithmic intricacies.

The deep learning landscape is enriched with numerous tools and libraries made to simplify complex tasks. In the domain of Computer Vision, object detection has been one of the tasks that has attracted a lot of attention. With the release of Detectron2, Facebook AI Research (FAIR) took the challenge head-on, offering a cutting-edge platform for this purpose.

What is Detectron2?

Detectron2 is an open-source project from Facebook AI Research (FAIR) and represents the second version of the Detectron library. Unlike its predecessor, Detectron2 is written in PyTorch, one of the most popular deep learning libraries. This transition provides developers and researchers with greater flexibility, extensibility, and ease of use.

Detectron2 object detection on cyclists

Key features of Detectron2

    1. Modular and flexible design: Detectron2 is built with modularity in mind. This allows researchers and developers to easily plug in new components or tweak existing ones without much hassle.

    2. Extensive model zoo: It comes with a plethora of pre-trained models. Whether you are looking to implement instance segmentation, panoptic segmentation, or plain object detection, Detectron2 has a pre-trained model available.

    3. Native PyTorch implementation: Unlike its predecessor, which was built on Caffe2, Detectron2 leverages the capabilities of PyTorch, making it much easier to use and integrate with other PyTorch-based tools.

    4. Training and evaluation utilities: Detectron2 provides out-of-the-box functionalities that streamline the process of training, evaluating, and fine-tuning models.

Detectron2 model zoo: models for every computer vision tasks

Detectron2 provides a wide range of models in its model zoo, each tailored for specific computer vision tasks. Here's a breakdown of the main models Detectron2 offers for different tasks:

1. Object detection

  • Faster R-CNN: This is a pioneering model that combines Region Proposal Networks (RPN) with Fast R-CNN for end-to-end object detection.
  • TridentNet: An object detection model that introduces multi-branch architectures, called "tridents," to handle objects of various scales more effectively.
  • RetinaNet: This model uses a Feature Pyramid Network (FPN) backbone and a novel focal loss to address the problem of class imbalance during object detection.
Dectectron2 object detection baby dog

2. Semantic segmentation:

DeepLabv3+: An encoder-decoder structure-based model that is known for great performance in semantic segmentation tasks, leveraging atrous convolutions and fully connected spatial pyramid pooling.

Deeplabv3+ semantic segmentation detectron2 road

3. Instance segmentation

  • Mask R-CNN: Building upon Faster R-CNN, Mask R-CNN adds a segmentation mask prediction branch, allowing it to predict object masks along with bounding boxes.
  • PointRend: A technique that iteratively refines segmentation masks by focusing on uncertain regions and employs point-based rendering to produce high-resolution and detailed object boundaries.
Detectron2 PointRend instance segmentation people laughing.

4. Panoptic segmentation

  • Panoptic FPN: A model that addresses both semantic and instance segmentation tasks. It unifies the typically distinct semantic segmentation and instance segmentation tasks under a single framework.
Detectron2 Panoptic segmentation group of people with umbrellas

5. Keypoint detection:

  • Keypoint R-CNN: An extension of Mask R-CNN, it predicts object keypoints in addition to bounding boxes and masks, making it useful for tasks such as human pose estimation.
Detectron2 keypoint detection on rugby players

5. Dense pose estimation

  • DensePose R-CNN: A model that maps all human pixels in an RGB image to the 3D surface of the human body. It's useful for detailed human pose estimation.
Detectron2 dense pose estimation

How to use detectron2?

For this section, we will navigate through the Detectron2 documentation for instance segmentation. Before jumping in, we recommend that you review the entire process as we encountered some steps that were problematic. Browsing through our various attempts you will save time and energy. 


Detectron2 suggests specific OS and Python and PyTorch versions for optimal results:

  • Linux or macOS with Python ≥ 3.7
  • PyTorch ≥ 1.8 and torchvision that matches the PyTorch installation. Install them together at to make sure of this
  • OpenCV is optional but needed by demo and visualization.

That said, we are venturing forward on a Windows machine, and for all the Windows users reading this, let's make it happen!

‍Environment setup

Setting up a working environment begins with creating a Python virtual environment and then installing the Torch dependencies.

Creating the virtual environment

Python 3.7 or higher is suggested, we are opting for Python 3.10:

python -m virtualenv venvdetectron2 --python=python3.10

If you're new to virtual environments, here's a comprehensive guide to help you set up one.

Installing Torch, Torchvision and OpenCV

After activating the ‘venvdetectron2' virtual environment, we proceed to install the PyTorch and OpenCV dependencies:

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --extra-index-url
pip install opencv-python

How to install Detectron2?

Now it’s time to build detectron2 from source.

Attempt 1: installing from Git repository

Following the official recommendation, we initially tried installing Detectron2 directly from its Git repository:

python -m pip install git+

Unfortunately, I was met with an error message.

Attempt 2: installing from a local clone

We proceeded to try a different approach by cloning the repository locally and then installing:

git clone
python -m pip install -e detectron2

This time, the compilation error stack was so extensive that the command prompt interface wouldn't even display the beginning of it.

It was disappointing to discover that the Detectron2 documentation did not provide any further installation alternatives except dockers which are ephemeral and complex to set up. Additionally, the 'common installation issues section' didn't address the specific error I encountered.

Windows Support: an oversight?

When facing issues with a particular repository, the 'issues' section is typically a reliable resource for potential solutions. 

At the time of writing this post, there were several open issues related to support for Windows users. Unfortunately, the lack of response from the facebookresearch Detectron2 team suggests that Windows support may not be forthcoming.

Considering the 2023 Stack Overflow survey indicates Windows remains the dominant operating system for developers (both in personal and professional spheres), the absence of Windows support is indeed perplexing.

2023 stack overflow survey on developer operating system usage

Attempt 3: successful installation of Detectron2

Given the lack of Windows support, we forked and edited the repository for correct compilation across the different operating systems.

pip install git+

Inference with a pre-trained model

We selected the ‘mask_rcnn_R_50_FPN_3x’ model and its corresponding config file from the model zoo. To demonstrate the built-in configurations, we utilized the ‘’ provided. Note that ‘’ can be found in the ‘detectron2/demo’ directory.

python --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input YOUR_INPUT.png --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Detectron2 panoptic segmentation couple biking

‍Using Detectron2: insights from a user

Navigating the Detectron2 setup proved to be a time-consuming challenge, taking over an hour and significant efforts to successfully implement.

Although the demo was executed seamlessly, identifying the right combination of model weights and configuration files for more extensive testing is less than intuitive.

In the following section, we'll demonstrate how to simplify the installation and usage of Detectron2 via the Ikomia API, significantly reducing both the steps and time needed to execute object detection tasks.

Easier Detectron2 object detection with a Python API 

With the Ikomia team, we've been working on a prototyping tool to avoid dependencies and compatibility issues, thereby speeding up the often tedious processes of installation and testing.

We wrapped it in an open source Python API. Now we're going to explain how to use all the Detectron2 models in less than 5 minutes. 

If you have any questions, please join our Discord.

‍Environment setup

As usual, we will use a virtual environment.

Then the only thing you need to install is Ikomia API:

pip install ikomia

Detectron2 instance segmentation inference

You can also charge directly the notebook we have prepared.

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()

# Add algorithm
algo = wf.add_task(name="infer_detectron2_instance_segmentation", auto_connect=True)

# Set parameters
    "model_name": "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x",

# Run on your image

# Display the results

Detectron2 panoptic segmentation couple biking

Fast Detectron2 execution: from setup to results in just 5 minutes

To carry out instance segmentation, we simply installed Ikomia and ran the workflow code snippets. All dependencies were seamlessly handled in the background. We progressed from setting up a virtual environment to obtaining results in approximately 5 minutes.

Explore further with the Detectron2 Algorithms

We've implemented all the Detectron2 algorithms for both inference and training. You can conveniently find code snippets tailored to your needs on the Ikomia HUB.

Crafting production-ready Computer Vision applications with ease

Real-world object detection applications frequently necessitate fine-tuning your model and integrating it with other models, such as object tracking.

One of the standout benefits of the API, aside from simplifying dependency installations, is its innate ability to seamlessly interlink algorithms from diverse frameworks, including frameworks like Detectron2, OpenMMLab, YOLO, Hugging Face. 

Once you have crafted your solution with Ikomia's Python API, you can deploy it yourself, or opt for SCALE, our automated deployment SaaS platform.

No items found.

Build with Python API


Create with STUDIO app


Deploy with SCALE