Journal "Software Engineering" (Programmnaya Ingeneria) | Preparing Data for the Human Pose Estimation Task

Main

New Issue

Archive

Most cited articles

Editor in chief

Editorial board

For the authors

Publishing ethic

Peer reviewing

Publishing House

Old site

Russian

Issue N8 2025 year

DOI: 10.17587/prin.16.421-432

Preparing Data for the Human Pose Estimation Task

M. A. Potenko, Postgraduate Student, potenkog@gmail.com, Moscow Aviation Institute (National Research University), Moscow, 125993, Russian Federation

Corresponding author: Maxim A. Potenko, Postgraduate Student, Moscow Aviation Institute (National Research University), Moscow, 125993, Russian Federation E-mail: potenkog@gmail.com

Received on December 13, 2024

Accepted on May 05, 2025

A method for collecting and preparing data for use in training neural networks in the field of Human Pose Estimation has been demonstrated. This area is highly relevant in computer vision, as it enables applications such as automatic background replacement, motion capture, augmented reality, and virtual reality. The publication presents the results of research into the challenges encountered in this field during the collection of real and synthetic data. It highlights the limitations of existing datasets, which often fail to cover rare or complex scenarios, such as extreme poses, rapid movements, or actions in unconventional conditions. To address these gaps, the study explores the use of synthetic data generated through advanced techniques, including procedural modeling, physically accurate rendering, and animation in 3D environments like Blender. The research investigates the impact of the amount of synthetic data on training quality and identifies the optimal ratio of synthetic to real data to achieve the best training results. For instance, it was found that a 1:2 ratio of synthetic to real data provides the most balanced performance, although this may vary depending on the specific dataset and neural network architecture. The work provides graphical and textual representations of human body joints, proposing a topology of 33 keypoints that includes additional points on hands and feet to better capture body orientation. A segmented model of the human body is also presented, which is crucial for tasks like image segmentation and pose estimation. Approaches to increasing the volume of such data and their processing are proposed with the aim of improving the accuracy of trained models in various scenarios. These include data augmentation techniques, such as rotation, scaling, brightness adjustment, and noise addition, as well as tools for efficient annotation. Additionally, existing datasets and tools for annotating them are provided, offering a comprehensive resource for researchers in the field. The study underscores the importance of meticulous data preparation for achieving high training accuracy, particularly when analyzing complex objects like the human body.

Keywords: neural networks, convolutional neural networks, computer vision, keypoints, image segmentation, object recognition, synthetic data, Human Pose Estimation, datasets, image processing

pp. 421—432

For citation:
Potenko M. A. Preparing Data for the Human Pose Estimation Task, Programmnaya Ingeneria, 2025, vol. 16, no. 8, pp. 421—432. DOI: 10.17587/prin.16.421-432 (in Russian).

References:

Bolon-Canedo V., Sanchez-Marono N., Alonso-Betanzos A. A review of feature selection methods on synthetic data, Knowl. Inf. Syst., 2013, vol. 34, pp. 483—519. DOI: 10.1007/s10115-012-0487-8.
Nikolenko S. I. Synthetic Data for Deep Learning, Springer International Publishing, Cham, Switzerland, 2021, 156 p., available at: https://www.overdrive.com/media/6365898/synthetic-data-for-deep-learning (date of access 03.12.2024).
Shorten C., Khoshgoftaar T. M. A survey on image data augmentation for deep learning, Journal of Big Data, 2019, vol. 6, no. 1, article 60. DOI: 10.1186/s40537-019-0197-0.
Wang X., Wang K., Lian S. A survey on face data augmentation for the training of deep neural networks, Neural Computing and Applications, 2020, vol. 32, no. 15, pp. 15503—15531. DOI: 10.1007/ s00521-019-04631-4.
Nowruzi F. E., Kapoor P., Kolhatkar D. et al. How much real data do we actually need: Analyzing object detection performance using synthetic and real data, ArXiv, 2019, abs/1907.07061, available at: https://arxiv.org/abs/1907.07061 (date of access 11.02.2025).
Tsirikoglou A., Kronander J., Wrenninge M., Unger J. Procedural modeling and physically based rendering for synthetic data generation in automotive applications, CoRR, 2017, abs/1710.06270, available at: https://arxiv.org/abs/1710.06270 (date of access 11.02.2025).
Saleh F. S., Aliakbarian M. S., Salzmann M. et al. Effective use of synthetic data for urban scene semantic segmentation, CoRR, 2018, abs/1807.06132, available at: https://arxiv.org/abs/1807.06132 (date of access 11.02.2025).
Ludl D., Gulde T., Thalji S., Curio C. Using simulation to improve human pose estimation for corner cases, Proc. of the 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 3575—3582. DOI: 10.1109/ITSC.2018.8569857.
Li S., Chan A. B. 3D human pose estimation from monocular images with deep convolutional neural network, Computer Vision — ACCV 2014. Lecture Notes in Computer Science, 2014, vol. 9006, Springer, Cham., pp. 332—347. DOI: 10.1007/978-3-319-16817-3_23.
Shotton J., Sharp T., Kipman A. et al. Real-time human pose recognition in parts from single depth images, Communications of the ACM, 2013, vol. 46, no. 1, pp. 116 — 124. DOI: 10.1145/2398356.2398381.
Cao Z., Hidalgo G., Simon T. et al. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, ArXiv, 2018, abs/1812.08008. DOI: 10.48550/arXiv.1812.08008.
Lin T.-Y., Maire M., Belongie S. et al. Microsoft COCO: Common objects in context, Computer Vision — ECCV 2014. Lecture Notes in Computer Science, 2024, vol 8693, Springer, Cham., pp. 740—755. DOI: 10.1007/978-3-319-10602-1_48.
Zhang S-H., Li R., Dong X. et al. Pose2Seg: Detection-free Human Instance Segmentation, 2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 889-898. DOI: 10.1109/CVPR.2019.00098.
Andriluka M., Pishchulin L., Gehler P., Schiele B. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 3686—3693. DOI: 10.1109/CVPR.2014.471.
Ionescu C., Papava D., Olaru V., Sminchisescu C. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, vol. 36, no. 7, pp. 1325—1339. DOI: 10.1109/TPAMI.2013.248.
Yang L., Kang B., Huang Z. et al. Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024, pp. 10371—10381. DOI: 10.1109/CVPR52733.2024.00987.
Bradski G., Kaehler A. Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library, O'Reilly Media, Sebastopol, California, USA, 2017, 990 p.
Lin T.-Y., Dollar P, Girshick R. et al. Feature Pyramid Networks for Object Detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 936—944. DOI: 10.1109/CVPR.2017.106.
Kingma D., Jimmy B. Adam: A Method for Stochastic Optimization, International Conference for Learning Representations, San Diego. 2015, 15 p. DOI: 10.48550/arXiv.1412.6980.
Charbonnier P., Blanc-Feraud L., Aubert G., Barlaud M. Two deterministic half-quadratic regularization algorithms for computed imaging. Proceedings of IEEE International Conference on Image Processing, Austin, TX, USA, 1994, vol. 2, pp. 168—172. DOI: 10.1109/ICIP.1994.413553.
Goodfellow I., Bengio Y., Courville A. Deep Learning, MIT Press, 2016, 800 p.
Loper M., Mahmood N., Romero J. et al. SMPL: A Skinned Multi-Person Linear Model, ACM Transactions on Graphics (TOG), 2015, vol. 34, no. 6, pp. 248:1—248:16. DOI: 10.1145/2816795.2818013.
Robinette K., Daanen H., Paquet E. et al. The CAESAR project: a 3-D surface anthropometry survey, Second International Conference on 3-D Digital Imaging and Modeling, Ottawa, ON, Canada, 1999, pp. 380—386. DOI: 10.1109/IM.1999.805368