Understanding ResNeXt: Revolutionizing Convolutional Neural Networks

Allan Kouidri
ResNext abstract illustration

Building upon the successes of its predecessor, ResNet, ResNeXt introduces a novel approach to convolutional neural networks (CNNs) that enhances their performance and efficiency. 

This post delves into the core concepts of ResNeXt, its unique features, and its implications in the field of AI and deep learning.

What is ResNeXt?

ResNeXt stands for Residual Networks with Aggregated Transformations. It's a type of CNN that was introduced to address some of the limitations found in traditional CNN architectures. 

The key innovation in ResNeXt is its use of "cardinality" – the size of the set of transformations – as a new dimension, along with depth and width, for scaling up neural networks.

Key features of ResNeXt

The ResNeXt model, unveiled in the 2016 paper 'Aggregated Residual Transformations for Deep Neural Networks' by Kaiming He and colleagues, blends the layer-stacking approach of VGG/ResNet with Inception's split-transform-merge strategy. 

Its primary aim is to enhance performance without increasing computational depth (more computation):

Cardinality as a key factor

In ResNeXt, cardinality refers to the number refers to the number of parallel transformation paths within a network layer

This concept introduces a new dimension for scaling the complexity of the network, different from the traditional methods of increasing the depth (number of layers) or width (number of units per layer).

Cardinality allows for more diverse and rich feature transformations without significantly complicating the network's structure.

Split-Transform-Merge strategy

The architecture employs a strategy where the input is first split into multiple ‘parallel’ paths. Each path undergoes a different transformation but with shared parameters. Finally, the outputs of these paths are aggregated (merged).

This strategy enhances the learning capability of the network without a significant increase in complexity.

In the illustration above, the architecture on the left represents ResNet, while on the right, you see ResNeXt. Both these networks utilize the split-transform-merge strategy.


This approach initially divides the input into lower dimensions through a 1x1 convolutional layer, then applies transformations using 3x3 convolutional filters, and finally integrates the outputs through a summation operation. 

The key aspect of this strategy is that the transformations are derived from the same structural design, facilitating ease of implementation without necessitating specialized architectural modifications. The primary goal of ResNeXt is to effectively manage large input sizes and enhance network accuracy. 

This is achieved not by adding more layers, but by increasing the cardinality - the number of parallel paths in the network. This approach effectively boosts performance while maintaining a relatively simple complexity compared to deeper networks

Residual connections

Like ResNet, ResNeXt utilizes residual connections, which help in avoiding the vanishing gradient problem in deep networks. These connections allow the network to learn identity functions, ensuring that deeper models can perform at least as well as shallower ones.

A diagram of a diagramDescription automatically generated
The residual connection creates a shortcut path by adding the value at the beginning of the block, x, directly to the end of the block (F(x) + x) [2]

Increased efficiency: cardinality vs width

ResNeXt demonstrates increased performance without significantly raising computational complexity. This efficiency stems from its use of shared parameters and grouped convolutions.

Comparative analysis on ImageNet-1K
A graph of a graph of a graphDescription automatically generated with medium confidence
  • ResNeXt-50 vs. ResNet-50: With similar complexity (about 4.1 billion FLOPs and 25 million parameters), the 32×4d ResNeXt-50 outperforms ResNet-50, showing a 1.7% lower validation error (22.2% vs. 23.9%).
  • ResNeXt-101 vs. ResNet-101: In a similar complexity bracket (approximately 7.8 billion FLOPs and 44 million parameters), 32×4d ResNeXt-101 surpasses ResNet-101 by 0.8% in validation error.

Applications and implications

  • Image Recognition: ResNeXt has shown outstanding performance in image recognition tasks, even outperforming more complex models in some instances.
  • Transfer Learning: Due to its efficiency and generalizability, ResNeXt is an excellent candidate for transfer learning applications, where a pre-trained model is adapted to new tasks.
  • Efficient Resource Utilization: Its ability to achieve high accuracy without a proportional increase in computational resources makes it suitable for applications where efficiency is crucial.

Easily run ResNeXt for image classification

The Ikomia API simplifies the process of image classification using ResNeXt, requiring minimal coding effort.


Start by setting up a virtual environment [3] and then install the Ikomia API within it for an optimized workflow:

pip install ikomia

Run ResNeXt with a few lines of code

You can also directly charge the notebook we have prepared.

from ikomia.dataprocess.workflow import Workflow
from ikomia.utils.displayIO import display

# Init your workflow
wf = Workflow()

# Add algorithm
algo = wf.add_task(name="infer_torchvision_resnext", auto_connect=True)

    "model_name": "resnext101",
    "input_size": "224",

# Run directly on your image

# Inpect your result

Train your own ResNeXt model

This article provided an in-depth look at ResNeXt, a highly efficient deep learning model.

We also discussed how the Ikomia API eases the integration of ResNeXt algorithms, reducing the complexity of handling dependencies.

The API optimizes Computer Vision workflows, providing adaptable parameter settings for training and testing stages.

To dive deeper, explore how to train ResNet/ResNeXt models your custom dataset →

  • Refer to the API documentation for detailed information on its features.
  • Explore Ikomia HUB for advanced algorithms.
  • Use Ikomia STUDIO for a graphical interface with the same functionalities as the API.


[1] Aggregated Residual Transformations for Deep Neural Networks

‍[2] Deep Residual Learning for Image Recognition

[3] How to create a virtual environment in Python

No items found.

Build with Python API


Create with STUDIO app