Complete OpenPose guide [Updated Mar 2024]

Guillaume Demarcq
-
8/2/2023
OpenPose for injury prevention

Discover the World of OpenPose: Revolutionizing Pose Estimation

Welcome to the fascinating world of OpenPose, the cutting-edge technology transforming how machines understand human body language. Have you ever wondered how computers can interpret complex human movements in real-time? OpenPose is the answer, and this guide will take you through its incredible capabilities, applications, and how it stands out from the crowd.

OpenPose is one of the most popular pose estimation libraries. Its 2D and 3D keypoint detection features are widely used by data science researchers all over the world.

Here is an analysis of its features, application fields, cost for commercial use and alternatives. This should help you decide whether OpenPose is the right choice for your project in artificial intelligence.

What is OpenPose?

At its core, OpenPose is a groundbreaking pose estimation tool. It uses advanced neural networks to detect human bodies, hands, and facial keypoints in images and videos. Imagine a system that can track every movement of a dancer or the subtle expressions of a speaker – that's OpenPose in action. To make it more relatable, think of it as teaching computers to understand and interpret human body language in a way that was never possible before.

OpenPose is a real-time multi-person keypoint detection library for body, face, and hand estimation. It is capable of detecting 135 keypoints.

It is a deep learning-based approach that can infer the 2D location of key body joints (such as elbows, knees, shoulders, and hips), facial landmarks (such as eyes, nose, mouth), and hand keypoints (such as fingertips, wrist, and palm) from RGB images or videos.

The library was created by a group of researchers from Carnegie Melon University and is now maintained by two of its initial creators.

OpenPose is known for its robustness to multi person pose estimation settings and is the winner of the COCO 2016 Keypoints Challenge.

Hand openpose

How does OpenPose work?

OpenPose's magic lies in its complex algorithms and neural network models. It processes visual data, breaking down images into key body points, and then maps these points to create a digital skeleton. This process, known as pose estimation, is not just about detecting where a limb is; it's about understanding the movement and posture in a dynamic environment. For instance, in sports analytics, OpenPose can analyze an athlete's posture to enhance performance or prevent injuries.

The initial step of the OpenPose library involves extracting features from an image by utilizing the initial layers. 

These extracted features are then fed into two separate divisions of convolutional neural network layers. One division is responsible for predicting 18 confidence maps, each representing a specific part of the human pose skeleton. 

Simultaneously, the other division predicts a set of 38 Part Affinity Fields (PAFs) that indicate the level of association between different body parts. The subsequent stages are utilized to refine the predictions generated by these divisions. 

Confidence map assist in constructing bipartite graphs between pairs of body parts, while Affinity Field PAF values help identify and eliminate weaker connections within these 

bipartite graphs. 

By following these steps, it becomes possible to estimate and allocate human pose skeletons to each individual depicted in the image.

OpenPose Pipeline Steps

So in summary, OpenPose will do these tasks in sequence:

  1. Initially, the entire image, whether it's an image or a video frame, is taken as input.
  2. Next, two-branch Convolutional Neural Networks (CNNs) work together to predict confidence maps, which aid in body part detection.
  3. The estimation of Part Affinity Fields (PAFs) comes next, which enables the association of different body parts.
  4. A collection of bipartite matchings is then created to link body part candidates.
  5. Finally, these matched body parts are assembled to form complete full-body poses for all individuals present in the image.

OpenPose features

OpenPose allows computer science professionals across the globe to use a vast selection of features for different computer vision applications.

2D real-time multi-person keypoint detection

2D human pose estimation is one of the most appreciated tasks that OpenPose model can do. Here’s a few frequently used estimations that can be achieved with OpenPose:

  • 15, 18 or 25-keypoint body/foot keypoint estimation, including 6 foot key points. Runtime invariant to the number of detected people.
  • 2x21-keypoint hand key point estimation. Runtime depends on the number of detected people.
  • 70-keypoint face keypoint estimation. Runtime depends on the number of detected people. See OpenPose Training for a runtime invariant alternative.

OpenPose body keypoints

3D real-time single-person keypoint detection

3D pose estimation is another OpenPose feature that makes this a very powerful library of algorithms.

  • 3D triangulation from multiple single views.
  • Synchronization of Flir cameras handled.
  • Compatible with Flir/Point Grey cameras.

Calibration toolbox

Estimation of distortion, intrinsic, and extrinsic camera parameters.

Single-person tracking for further speedup or visual smoothing.

OpenPose input

Input can be image, video, webcam, Flir/Point Grey, IP camera, and support to add your own custom input source (e.g., depth camera). This means you can estimate human movement in real time as well as analyze still images.

OpenPose output

Basic image + keypoint display/saving (PNG, JPG, AVI, ...), keypoint saving (JSON, XML, YML, ...), keypoints as array class, and support to add your own custom output code (e.g., some fancy UI).

OpenPose can output the keypoints as 2D coordinates, 3D coordinates, or heatmap values, providing flexibility for different applications.

OpenPose OS

Ubuntu (20, 18, 16, 14), Windows (10, 8), Mac OSX, Nvidia TX2.

OpenPose hardware compatibility

CUDA (Nvidia GPU), OpenCL (AMD GPU), and non-GPU (CPU-only) versions.

OpenPose APIs

OpenPose has APIs in several programming languages such as Python, C++, and MATLAB, and can be integrated with other machine learning libraries and frameworks such as TensorFlow, PyTorch, and Caffe.

OpenPose applications

Before we jump into the areas of OpenPose human pose estimation algorithm uses, let’s first take a look at the most important tasks you can do with OpenPose.

Multi-person pose estimation 

OpenPose can detect the poses of multiple people in the same image or video stream simultaneously, making it ideal for applications such as action recognition, gesture recognition, and human-computer interaction.

Firemen openpose

Real-time performance 

OpenPose can process images and videos in real-time on modern GPUs, making it suitable for real-time applications such as sports analysis, gaming, and virtual reality.

Accurate keypoint detection

OpenPose can detect key body, face, and hand keypoints with high accuracy, even in challenging scenarios such as occlusion and cluttered backgrounds.

OpenPose has a wide range of applications in various fields. Here are some examples of OpenPose applications in different domains

OpenPose in different industries

Due to its outstanding ability to find and track human poses, OpenPose became a Computer Vision staple in many different industries.

OpenPose for sports Analysis

OpenPose algorithm can be used for many different sports applications, such as injury prevention and gaming.

Human kinetics analysis

Analyzing movements and techniques of athletes to improve their performance in sports like basketball, tennis, and golf.

Injury prevention

Identifying improper posture or movement that could lead to injuries in sports like running, weightlifting, and football.

Gaming 

Using motion tracking to control game characters using the player's body movements, as seen in games like Kinect Sports and Just Dance.

OpenPose for robotics

As you might imagine, OpenPose has multiple applications within the robotics industry.

Human-Robot interaction

Developing robots that can interact with humans using natural body movements, like in personal assistance robots, factory automation, and social robots.

Object manipulation

Controlling robotic arms using hand and finger movements detected by OpenPose, like in manufacturing and assembly line robots.

Gesture recognition 

Detecting and recognizing human gestures, like waving, pointing, and hand signals, to control robots, like in home automation and virtual assistants.

OpenPose for healthcare

Healthcare is another area that OpenPose can help with loads of tasks.

Physical therapy 

Monitoring patients' movements during rehabilitation exercises and providing real-time feedback to improve their posture and technique.

Elderly care

Detecting falls and monitoring the activities of elderly people in their homes using OpenPose-based cameras.

Surgery

Providing surgeons with real-time feedback on the positioning and movement of their hands during surgical procedures.

OpenPose for security and surveillance

When it comes to security and surveillance, OpenPose finds many application fields for humans, objects and animals. 

Intrusion detection

Detecting and tracking human movements in restricted areas or identifying suspicious activities in real-time.

Crowd monitoring

Analyzing crowd behavior, detecting anomalies, and providing insights for crowd management and public safety.

Perimeter security

Monitoring and analyzing human presence along the perimeter of secure areas, detecting unauthorized entry attempts or potential breaches.

Crowd behavior analysis

Analyzing crowd dynamics, crowd density, and movement patterns in crowded public spaces, assisting in crowd management, event planning, and emergency response.

Traffic surveillance

Tracking and analyzing pedestrian movements at intersections, crosswalks, or public transportation hubs, facilitating traffic management and improving pedestrian safety.

Worker openpose

OpenPose for Entertainment

OpenPose is used by the entertainment industry for various applications.

Virtual reality

Tracking body movements to provide an immersive experience in virtual reality environments, like in VR games and simulations.

Animation

Capturing the motion of actors' bodies and facial expressions to create realistic and expressive animated characters.

Film and TV 

Tracking actors' movements during motion capture sessions and applying them to digital characters in movies and TV shows.

OpenPose for retail and e-commerce

Virtual try-on

Helping customers virtually try on clothes, accessories, or makeup, providing a more personalized and engaging shopping experience.

Customer behavior analysis

Track and analyzing customers' movements within a store, allowing retailers to optimize store layouts and product placements.

How much does OpenPose cost?

OpenPose is freely available for free non-commercial use, and may be redistributed under these conditions.

The license agreement can be used for academic or non-profit organization noncommercial research only.

There is a non-exclusive commercial license. It requires a non-refundable $25,000 USD annual royalty.

Note that the commercial license cannot be used in the field of sports. 

How to use OpenPose?

The code base is open-sourced on Github and is very well documented.

You can read the official installation documentation.

Install OpenPose

The first step is to install OpenPose on your system. OpenPose is available for various platforms, including Windows, Linux, and macOS. 

You can download the latest version of the OpenPose package from the official website.

The package includes pre-trained models and configurations that are ready to use, but can also be further customized according to your application needs.

Prepare the input data

OpenPose requires input data in the form of images or video streams. The input data can be captured using a camera or loaded from a file. 

Preprocessing the data before inputting it into OpenPose is necessary to ensure the best performance and accuracy of the model. This can be done through resizing, cropping, and filtering.

Configure OpenPose

Configuring OpenPose is an essential step in optimizing the model's performance and accuracy. OpenPose provides various configuration options that can be adjusted. 

The configuration options include model type, output format, resolution, and keypoint detection threshold. These options can be selected according to your application's specific requirements to achieve the best results.

Run OpenPose

Once the input data is prepared and the configuration options are set, OpenPose can be run on the data. OpenPose will analyze the input data and detect the keypoints of the human body, including the position, orientation, and movement of various body parts.

Visualize the output

The final step is to visualize the output of OpenPose. OpenPose provides various output formats, including JSON, XML, and CSV, which can be used to display the detected keypoints in real-time or post-processing analysis The output can be visualized using various tools, such as OpenCV, Matplotlib, or Unity.

OpenPose Alternatives and Comparisons

As powerful as OpenPose is, it's always worth exploring alternative pose estimation algorithms to determine which is best suited for your use case. 

Here are a few OpenPose alternatives to consider.

OpenPose vs Mediapipe

Lightweight, cross-platform framework for mobile devices and desktops that enables real-time, high-accuracy hand, facial, and pose tracking. 

One of the major advantages of MediaPipe is that it is optimized for mobile devices and can run on resource-constrained devices. 

However, it has limited support for 3D pose estimation and requires a significant amount of preprocessing for input data.

OpenPose vs Detectron2

Provides pre-trained models for keypoint detection and pose estimation. Detectron2 is highly customizable and supports a wide range of models, including Mask R-CNN and RetinaNet. 

However, it is more complex than other libraries, and its performance may be affected by hardware limitations.

OpenPose vs MMPose

A high-accuracy pose estimation framework that includes support for multi-person, 3D, and hand pose estimation. It also includes a variety of pre-trained models and data augmentation techniques for improved performance. 

However, it may require more computational resources than some of the other algorithms, and it is currently only available in PyTorch.

OpenPose vs Lightweight-human-pose-estimation.pytorch

PyTorch-based pose estimation algorithm that is designed to be lightweight and fast. It uses a human pose estimation model that has been optimized for running on devices with limited computational resources, such as mobile devices and Raspberry Pi boards. 

It can achieve real-time performance, making it suitable for applications such as human-computer interaction and sports analysis. 

However, its accuracy may be lower than some of the more complex algorithms.

OpenPose vs Freemocap

Open-source, markerless motion capture system that uses computer vision techniques to estimate the 3D position of a person's joints from a video stream. It includes support for multi-person pose estimation, as well as body and facial expression recognition. 

It can be used for a variety of applications, including animation, gaming, and biomechanics research. 

However, it may require more computational resources than some of the other algorithms, and its accuracy may be lower in challenging lighting conditions or with occlusions.

OpenPose vs AlphaPose

Offers faster performance than OpenPose and can detect multiple people in a single image or video stream. 

However, it may have lower accuracy for small or occluded body parts due to its reliance on bottom-up detection and clustering.

OpenPose vs DeeperCut

Offers higher accuracy than OpenPose, making it a good choice for fine-grained pose estimation and occluded body parts. 

However, it is slower than OpenPose due to its reliance on graphical models and requires careful tuning of its hyperparameters.

OpenPose vs HRNet 

Boasts state-of-the-art accuracy and fast inference time, making it well-suited for real-time pose estimation and multi-person scenarios. 

However, it requires more computational resources than OpenPose due to its use of a deeper network architecture.

OpenPose vs EfficientPose

Offers efficient inference time and improved accuracy compared to other lightweight models, making it ideal for mobile and embedded applications. 

However, it may not be as accurate as some of the more complex algorithms due to its lightweight nature.

OpenPose vs DensePose 

Can handle more complex poses and motions and estimate detailed body part textures, making it a good choice for fashion and retail applications, virtual try-ons, and gaming and animation. 

However, it requires higher quality input images and is only available for non-commercial use due to licensing restrictions.

Compare OpenPose to Other Human Pose Estimation Algorithms

Here is a table with these OpenPose alternatives:

Algorithm Name

Downsides vs. OpenPose

Upsides vs. OpenPose

Best Fitted Use Cases

License Type and Cost

MediaPipe

Lower accuracy than OpenPose

Lightweight and real-time performance

Augmented reality, gaming, sports analysis

Open-source, free for commercial use

Detectron2

Higher computational requirements and complexity

State-of-the-art accuracy and customization options

Object detection, pose estimation in complex scenes

Apache License 2.0, free for commercial use

MMPose

Lower accuracy on complex scenes

High accuracy and multi-person pose estimation

Human-robot interaction, motion capture, 3D animation

Open-source, free for commercial use

lightweight-human-pose-estimation.pytorch

Lower accuracy than some complex algorithms

Lightweight and real-time performance on mobile devices and Raspberry Pi boards

Mobile human-computer interaction, sports analysis

Open-source, free for commercial use

freemocap

Higher computational requirements and lower accuracy in challenging lighting conditions or occlusions

Markerless motion capture and multi-person pose estimation

Animation, gaming, biomechanics research

Open-source, free for commercial use

AlphaPose

Lower accuracy on complex scenes and multi-person pose estimation

High accuracy on single-person pose estimation and real-time performance

Sports analysis, action recognition

Open-source, free for non-commercial use

DeeperCut

Lower accuracy on multi-person pose estimation and challenging poses

High accuracy on single-person pose estimation and robustness to occlusions

Medical imaging, biomechanics research

Open-source, free for non-commercial use

HRNet

Higher computational requirements and complexity

High accuracy and multi-scale feature representation

3D human pose estimation, hand pose estimation

Open-source, free for non-commercial use

EfficientPose

Not as widely tested or validated as some other algorithms

Efficient inference time, improved accuracy compared to other lightweight models

Mobile and embedded applications, resource-constrained environments

Open source, free to use

DensePose

Requires higher quality input images

Can estimate detailed body part textures, can handle more complex poses and motions

Fashion and retail applications, virtual try-ons, gaming and animation

Open source, free to use (non-commercial only)



Note: The license type and cost may vary depending on the specific use case and the terms of the license agreement. Please refer to the individual project websites for more information.

Best alternatives to OpenPose for commercial use

If you are planning to create a solution for commercial use  requiring multi-person keypoint detection, the Ikomia team advises choosing either Detectron2 or MMPose.

Both of these alternatives are freely available for commercial use under the Apache 2.0 license and are actively maintained by a strong community. You can also find them in the Ikomia HUB.

Arrow
Arrow
No items found.
#API

Build with Python API

#STUDIO

Create with STUDIO app

#SCALE

Deploy with SCALE