YOLOP v2 Explained: Mastering Panoptic Driving Perception

YOLO, which stands for "You Only Look Once," has been a groundbreaking series in the realm of object detection. With each iteration, the YOLO series has brought about significant improvements in terms of speed and accuracy.

‍

The recent YOLOP v2, introduced in the paper titled " YOLOP v2: Better, Faster, Stronger for Panoptic Driving Perception"^[1] is no exception.

‍‍

This article delves into the technicalities of YOLOP v2, its advancements, and its significance in driving perception.

What is panoptic driving perception?

Before diving into YOLOP v2, it's essential to understand what panoptic driving perception means. Panoptic perception combines semantic segmentation (categorizing each pixel of an image into a class) and instance segmentation (distinguishing object instances of the same class).

‍

In the context of driving, panoptic perception is crucial as it helps in understanding both the type of objects (like cars, pedestrians, or road signs) and individual object instances (like two different cars of the same model).

‍

What is YOLOP v2?

YOLOP v2 is a highly efficient multi-task network designed to carry out three critical functions for autonomous driving systems: detecting traffic objects, segmenting drivable areas of the road, and identifying lane markings, all in real-time.

‍

YOLOP v2: key features

Unified architecture: Unlike other models that use separate networks for detection and segmentation, YOLOP v2 uses a single, unified architecture. This not only simplifies the model but also reduces computational overhead.
Enhanced backbone: YOLOP v2 employs CSPDarknet53 as its backbone, which is both lightweight and powerful. This ensures that the model can process images quickly without compromising on accuracy.
Multi-task learning: The model is trained for multiple tasks simultaneously - object detection, semantic segmentation, and instance segmentation. This multi-task learning approach ensures that the model generalizes well across different tasks.‍
Panoptic Quality (PQ) metric: To evaluate the model's performance, YOLOP v2 uses the PQ metric, which combines both segmentation and detection results. This metric provides a comprehensive view of the model's efficacy in panoptic perception.

YOLOP v2 architecture

‍

‍

The backbone of YOLOP v2 (shared encoder)

The backbone, or the feature extractor, is the heart of any deep learning model. For YOLOP v2:

‍

E-ELAN (Extended ELAN) is used as the feature extractor. E-ELAN employs group convolution, a technique that divides the input channels into groups and applies convolutions to each group separately. This not only reduces the computational cost but also captures diverse features.
Post the backbone, the neck uses concatenation to fuse the features. This ensures that the model captures both low-level and high-level features, which are crucial for tasks like object detection and segmentation.
Spatial Pyramid Pooling (SPP) is employed to capture multi-scale contextual information by pooling features at different scales. This is especially useful for detecting objects of varying sizes.
The Feature Pyramid Network (FPN) further enhances the multi-scale feature extraction by fusing features at different semantic levels.

‍

YOLOP v2 decoders (task heads)

YOLOP v2 stands out by having three distinct task heads:

‍

Traffic object detection: This head is responsible for detecting various traffic objects like vehicles, pedestrians, and traffic signs.
Drivable area segmentation: It segments the areas in an image where a vehicle can drive.‍
Lane detection (Segmentation): This head detects and segments the lanes on the road.

‍

Advancements over YOLOP

YOLOP v2, an evolution from its predecessor YOLOP,

‍

Speed and efficiency: YOLOP v2 is designed to be faster and more efficient than its predecessors. The unified architecture and the optimized backbone contribute to its speed, making it suitable for real-time applications.
Improved accuracy: With the multi-task learning approach, YOLOP v2 achieves better accuracy in both detection and segmentation tasks. The model can discern objects with higher precision and segment them effectively.‍
Robustness in driving scenarios: YOLOP v2 is specifically optimized for driving scenarios. It can handle various challenges like occlusions, varying lighting conditions, and diverse object types, making it ideal for autonomous driving applications.

‍

Practical Applications

The advancements in YOLOP v2 make it a prime candidate for various applications in the automotive industry:

‍

Autonomous driving: With its ability to detect and segment objects in real-time, YOLOP v2 can be integrated into autonomous driving systems to help vehicles navigate safely.
‍Traffic analysis: The model can be employed in traffic monitoring systems to analyze vehicle flow, detect traffic violations, and assess road conditions.

YOLOP v2 in autonomous driving

‍

Get started with Ikomia API

Using the Ikomia API, you can effortlessly create a workflow for road, line and vehicle detection with YOLOP v2 in just a few lines of code.

‍

To get started, you need to install the API in a virtual environment ^[2].


pip install ikomia

‍

Run YOLOP v2 with a few lines of code

You can also directly charge the notebook we have prepared.

Go to notebook

Go to Colab


from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display


# Init your workflow
wf = Workflow()

# Add algorithm
algo = wf.add_task(name="infer_yolop_v2", auto_connect=True)

algo.set_parameters({
    "input_size": "640",
    "conf_thres": "0.2",
    "iou_thres": "0.45",
    "object": "True",
    "road_lane": "True"
})

# Run on your image  
wf.run_on(url="https://github.com/Ikomia-dev/notebooks/blob/main/examples/img/img_cars_road.jpg?raw=true")

# Inpect your result
display(algo.get_image_with_graphics())
display(algo.get_output(0).get_overlay_mask())

input_size (int) - default '640': Size of the input image.
conf_thres (float) default '0.2': Box threshold for the prediction [0,1].
iou_thres (float) - default '0.45': Intersection over Union, degree of overlap between two boxes [0,1].
object (bool) - default 'True': Detect vehicles.‍
road_lane (bool) - default 'True': Detect roads and lines.

YOLOPv2, road lane and vehicle detection — Note: this output was generated with Ikomia STUDIO, which provides a user-friendly interface with the same functionalities as the API.

Build your own workflow with Ikomia

In this tutorial, we have explored the process of creating a workflow for road and line detection with YOLOP v2.

‍

The Ikomia API streamlines the development of Computer Vision workflows, facilitating easy experimentation with various parameters to achieve the best outcomes.

‍

For a comprehensive presentation of the API, consult the documentation. Additionally, browse the list of cutting-edge algorithms available on Ikomia HUB and explore Ikomia STUDIO, which provides a user-friendly interface with the same functionalities as the API.

‍

^{[1] YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception - https://arxiv.org/abs/2208.11434}

^[2]^{How to create a virtual environment}