Understanding ResNet: A Milestone in Deep Learning and Image Recognition

Allan Kouidri
ResNet illustration

What is ResNet?

ResNet, short for Residual Network, is a type of convolutional neural network (CNN) architecture that was introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in their 2015 paper "Deep Residual Learning for Image Recognition." It represents a significant advancement in the field of deep learning, particularly in the context of Computer Vision.

How ResNet works?

ResNet was a response to the challenges faced by deeper networks. As networks grow deeper, they tend to suffer from vanishing gradients, where the gradient becomes so small it ceases to make a meaningful impact during training. ResNet, through its innovative architecture, tackled this issue head-on, enabling the construction of networks that are deeper yet more efficient than their predecessors.

Residual Blocks: the building blocks of ResNet

Residual blocks are the cornerstone of ResNet architecture. Each block consists of a few layers of convolutional neural networks (CNNs), usually followed by batch normalization and a ReLU activation function. 

The key aspect of a residual block is the addition of a skip connection that bypasses these layers.

In a typical CNN layer, the input x would be transformed by a function F(x), representing the convolutional operations. However, in a residual block, the output becomes F(x)+x. This means the block is learning the residual function with respect to the input, hence the name 'Residual Network'.

The residual connection creates a shortcut path by adding the value at the beginning of the block, x, directly to the end of the block (F(x) + x) [1].

This design allows the network to learn identity functions effectively, ensuring that higher layers can perform at least as well as lower layers, and often better. By stacking these blocks, ResNet can form deep networks.

Addressing the vanishing gradient problem 

Contrary to the expectation that adding more layers to a neural network improves its ability to recognize complex functions and features, it has been observed that this is not always the case. After a certain depth, training accuracy in plain neural networks starts to decline rather than improve, revealing a disconnect between theoretical assumptions and practical outcomes.

Understanding the vanishing gradient problem

In the realms of deep learning and data science, the vanishing gradient problem is a well-known challenge. This issue arises predominantly in the training of artificial neural networks that utilize backpropagation and gradient-based learning methods. 

The problem surfaces when the gradient, crucial for adjusting the network's weights, becomes exceedingly small—almost to the point of disappearing. This diminutive gradient inhibits the network's ability to effectively update its weights, resulting in a stagnation of training. The network gets trapped, repetitively propagating the same values without any significant learning progress.

Manifestation in Deep Neural Networks

Deep neural networks, due to their extensive depth, are particularly susceptible to this problem. As the loss function's gradient is propagated backward through the network's layers, its magnitude decreases exponentially with each layer. This reduction in gradient size leads to an extremely slow learning process or even a complete halt in the early layers of the network, impeding the overall training effectiveness.

Training error and test error with 20-layer and 56-layer “plain” networks. The deeper network has higher training error [1]

ResNet's innovative solution: skip-connection

ResNet, a seminal architecture in neural network design, introduces an innovative solution to this pervasive issue—skip connections. These connections create an alternate pathway for the gradient during the backpropagation process. Instead of navigating through every single layer, the gradient can traverse these skip connections, effectively bypassing multiple layers. 

34-layer ResNet model [1]

This architecture allows the gradient to maintain a substantial magnitude, circumventing the problem of becoming too small for practical training purposes. By implementing these skip connections, ResNet enables the efficient training of much deeper networks than previously possible, overcoming a critical barrier in deep neural network development. 

This approach not only addresses the vanishing gradient problem but also enhances the network's ability to learn from complex data sets, marking a significant advancement in the field of deep learning.

ResNet Architecture

Below is a detailed description of its various architectures:


  • Inspiration: ResNet-34 was inspired by VGG neural networks, notably VGG-16 and VGG-19, known for their use of 3×3 convolutional filters.
  • Design simplicity: Compared to VGGNets, ResNet-34 is designed with fewer filters and lower complexity. It follows two design rules: maintaining the same number of filters for layers with the same output feature map size and doubling the number of filters when the feature map size is halved, ensuring consistent time complexity per layer.
  • Performance: The 34-layer ResNet achieves 3.6 billion FLOPs (Floating Point Operations per Second), compared to 1.8 billion FLOPs for the smaller 18-layer variant.
  • Shortcut connections: These are integrated into the network to enable identity mapping, with direct usage when input and output dimensions are the same. For increased dimensions, two approaches are used: padding extra zero entries for identity mapping or employing projection shortcuts for dimension matching​​.


  • Bottleneck design: Building upon the ResNet-34 architecture, ResNet-50 introduces a bottleneck design to reduce the time needed for training layers. This is achieved by replacing the 2-layer blocks in ResNet-34 with 3-layer bottleneck blocks.
  • Enhanced accuracy: This change has led to improved accuracy compared to the 34-layer model.
  • Performance: The 50-layer ResNet achieves a performance of 3.8 billion FLOPs​​.

ResNet-101 and ResNet-152

  • More Layers: These larger variants, ResNet-101 and ResNet-152, are constructed by adding more 3-layer blocks, following the design introduced in ResNet-50.
  • Balancing complexity and depth: Despite their increased depth, these networks maintain lower complexity compared to VGG-16 or VGG-19 networks. For instance, the 152-layer ResNet registers 11.3 billion FLOPs, which is still lower than the 15.3 to 19.6 billion FLOPs of the VGG models​​.

Applications of ResNet

ResNet's ability to learn deep, complex representations makes it a powerful tool, pushing the boundaries of what's possible in computer vision and related fields.

Image Recognition

  • Versatile use cases: ResNet excels in image recognition tasks across various domains, from recognizing objects in everyday photos to classifying images in specialized datasets.
  • Benchmark performance: It has set new performance benchmarks on standard datasets like ImageNet.

Object Detection

  • Integration with detection frameworks: ResNet is often integrated into object detection frameworks like Faster R-CNN, providing the backbone network that extracts features for detecting objects.
  • Enhanced accuracy: This integration significantly improves accuracy in detecting and classifying objects within an image.

Video Analysis

  • Temporal data processing: ResNet can be adapted for processing video data, leveraging its deep architecture to understand and analyze temporal information in video frames.
  • Applications in surveillance and entertainment: Its use in video analysis spans from surveillance systems to video content analysis in the entertainment industry.

Medical Image Analysis

  • Diagnostic tool: ResNet is instrumental in medical imaging, aiding in the diagnosis of diseases from medical scans like X-rays, MRIs, and CT scans.
  • Pattern recognition: It helps in identifying patterns and anomalies that are indicative of various medical conditions, thereby assisting healthcare professionals in diagnosis and treatment planning.

Easily run ResNet for image classification

The Ikomia API allows for easy image classification with ResNet with minimal coding.


To begin, it's important to first install the API in a virtual environment [2]. This setup ensures a smooth and efficient start to using the API's capabilities.

pip install ikomia

Run ResNet with a few lines of code

You can also directly charge the notebook we have prepared.

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils import ik
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()    

# Add ResNet to the workflow
resnet = wf.add_task(ik.infer_torchvision_resnet(model_name="resnet50"), auto_connect=True)

# Run on your image  
# wf.run_on(path="path/to/your/image.png")

# Inspect your results

Train your own ResNet model

In this article, we have explored the intricacies of ResNet, a highly effective deep learning model. We've also seen how the Ikomia API facilitates the use of ResNet algorithms, eliminating the hassle of managing dependencies.

The API enhances the development of Computer Vision workflows, offering flexibility in adjusting parameters for both training and testing phases. 

To dive deeper, explore how to train ResNet models your custom dataset →

For more information on the API and its capabilities, you can refer to the documentation. Additionally, Ikomia HUB presents a range of advanced algorithms, and Ikomia STUDIO offers an intuitive interface for accessing these functionalities, catering to users who prefer a more graphical approach.


‍[1] Deep Residual Learning for Image Recognition

[2] How to create a virtual environment in Python

No items found.

Build with Python API


Create with STUDIO app