Mastering Object Detection with YOLOR: A Comprehensive Guide

Allan Kouidri
YOLOR PPE detection on construction workers

In the world of Computer Vision and deep learning, the race to develop the most efficient and high-performing models is never-ending. From the famous YOLO (You Only Look Once) series to various other architectures, the realm of object detection has seen numerous innovations.

Enter YOLOR – an advancement that takes the idea of YOLO further by combining it with the concept of unified representations. 

In this blog post, we will dive deep into YOLOR, its key features, and how it stands out in the crowded AI landscape.

Additionally, we'll guide you on how to easily train and test YOLOR using just a few Python code snippets.

What is YOLOR?

Before discussing YOLOR, it's essential to understand the foundation upon which it's built. YOLO was a game-changer in the object detection space because of its unique approach.

Instead of generating potential bounding boxes and then classifying them (as done by models like Faster R-CNN), YOLO divided the image into a grid and predicted bounding boxes and class probabilities in a single forward pass. 

This approach made YOLO extremely fast and efficient, albeit at the cost of some accuracy.

Horse detection pre-trained YOLOR

 What are the advantages of YOLOR?

  • Efficiency: Because YOLOR builds on the strengths of YOLO, it inherits its speed. But with its unified representation, it can handle multiple tasks without a need for separate models.
  • Accuracy: With its advanced features, YOLOR often achieves higher accuracy compared to its predecessors, especially on multi-task benchmarks.
  • Flexibility: Given its architecture, YOLOR can be adapted for a wide range of vision tasks, making it versatile.

 What is YOLOR architecture?

The key strength of YOLOR lies in its versatility, which is a direct outcome of its unique architectural components. This versatility allows it to efficiently bridge the gap between various vision tasks, from detection to classification, and even segmentation. Let’s break down its architecture to better understand the underlying mechanics and innovations.

YOLOR unified network
YOLOR single unified network

Embedding dynamic convolutions

YOLOR incorporates dynamic convolutions instead of the typical static ones. Unlike standard convolutions with fixed weights, dynamic convolutions adapt weights based on the input context. This adaptability sharpens the model's response to varied spatial contexts, proving invaluable in intricate scenes.

Integrating vision transformers

YOLOR integrates Vision Transformers (ViT), capitalizing on the recent strides in Computer Vision. ViT tokenizes images into patches and processes them using transformer blocks, enabling YOLOR to detect long-range dependencies in images. This is key in complex scenes with contextually intertwined objects.

Scale-equivariant representation

Addressing the challenge of varying object scales, YOLOR employs scale-equivariant layers. This approach ensures consistent recognition, regardless of object size, by using convolutional layers with different kernel sizes to capture diverse resolutions.

Unified multi-task learning

Beyond its architectural design, YOLOR's training process emphasizes a unified approach. It employs a compound loss function, optimizing for detection, classification, and potentially segmentation. This holistic approach expedites training and refines the model's shared feature representation.

Flexible backbone choices

YOLOR's flexibility shines in its backbone compatibility. It can seamlessly integrate with a variety of backbones, from CSPDarknet53 to Vision Transformers, allowing customization based on specific needs and ensuring robust performance.

YOLOR performance

YOLOR performance

YOLOR is an algorithm for object detection released in 2021, it matches and even outperforms a scaled YOLO v4 model. YOLOR, with its promise of "learning once" for multiple tasks, represents a significant leap in the evolution of object detection and Computer Vision models.

Its unified representation approach not only simplifies the model landscape but also holds the potential for improved efficiency and accuracy.

Easily train YOLOR object detection with a Python API 

With the Ikomia team, we've been working on a prototyping tool to avoid and speed up tedious installation and testing phases. 

We wrapped it in an open source Python API. Now we're going to explain how to train and test YOLOR in just a few lines of code.

‍Environment setup

To get started, you need to install the API in a virtual environment [1].

pip install ikomia

Construction safety dataset

In this tutorial, we will be working with the construction safety dataset from Roboflow. This dataset contains the following classes: ‘person’, helmet’, ‘vest’, ‘no-vest’ and ‘no-helmet’.

Two men working in construction site with bounding boxes targeting PPE.

Run the train YOLOR algorithm with a few lines of code

You can also charge directly the notebook we have prepared.

from ikomia.dataprocess.workflow import Workflow

# Initialize the workflow
wf = Workflow()

# Add the dataset loader to load your custom data and annotations
dataset = wf.add_task(name='dataset_coco')

# Set the parameters of the dataset loader
    'json_file': 'Path/To/construction safety.v1-release.coco/train/_annotations.coco.json',
    'image_folder': 'Path/To/construction safety.v1-release.coco/train',
    'task': 'detection',

# Add the YOLOR algorithm
train = wf.add_task(name='train_yolor', auto_connect=True)

# Set the parameters 
    'model_name': 'yolor_p6',
    'batch_size': '8',
    'epochs': '50',
    'train_imgsz': '512',
    'test_imgsz': '512',
    'dataset_split_ratio': '80',
    'eval_period': '5',

# Launch your training on your data

  • model_name (str) - default 'yolor_p6': Name of the pre-trained model. Other model: "yolor_w6"
  • epochs (int) - default '50': Number of complete passes through the training dataset.
  • batch_size (int) - default '8': Number of samples processed before the model is updated.
  • train_imgsz (int) - default '512': Size of the training image.
  • test_imgsz (int) - default '512': Size of the eval image.
  • dataset_split_ratio (float) – default '90': Divide the dataset into train and evaluation sets ]0, 100[.
  • eval_period (int) - default '5': Interval between evaluations.
  • output_folder (str, optional): path to where the model will be saved.

The training process for 50 epochs was completed in approximately one hour using an NVIDIA GeForce RTX 3060 Laptop GPU with 6143.5MB.

Test your fine-tuned YOLOR model

First, we can run the pre-trained YOLOR model:

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Initialize the workflow
wf = Workflow()

# Add the YOLOR segmentation alrogithm
yolor = wf.add_task(name='infer_yolor', auto_connect=True)
yolor.set_parameters({'conf_thres': '0.5'}) 

# Run on your image

# Inspect your results

Woman in construction site detected with YOLOR pre-trained model

We can observe that the YOLOR default pre-trained has only detected the person in the image. This is because the model has been trained on the COCO dataset which does not contain safety equipment. 

To test the model we just trained, we specify the path to our custom model using the ’model_weight_file’ and ‘config_file’ arguments. We then run the workflow on the same image we used previously.

    'model_weight_file': 'Path/To/[timestamp]/weights/',
    'config_file': 'Path/To/[timestamp]/yolor_p6.cfg',
    'conf_thres': '0.5'})

Woman in construction site with PPE detected with custom YOLOR model

Start training easily with Ikomia

To learn more about the API, you can consult the documentation. Furthermore, you can explore our collection of cutting-edge algorithms on Ikomia HUB and experience Ikomia STUDIO, a user-friendly interface that offers the same capabilities as the API.


[1] How to create a virtual environment

No items found.

Build with Python API


Create with STUDIO app


Deploy with SCALE