EHFusion : an efficient heterogeneous fusion model for group-based 3D human pose estimation

Peng, J; Zhou, Y; Mok, PY

doi:10.1007/s00371-024-03724-5

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/112581

Title:	EHFusion : an efficient heterogeneous fusion model for group-based 3D human pose estimation
Authors:	Peng, J Zhou, Y Mok, PY
Issue Date:	Jun-2025
Source:	Visual computer, June 2025, v. 41, no. 8, p. 5323–5345
Abstract:	Stimulated by its important applications in animation, gaming, virtual reality, augmented reality, and healthcare, 3D human pose estimation has received considerable attention in recent years. To improve the accuracy of 3D human pose estimation, most approaches have converted this challenging task into a local pose estimation problem by dividing the body joints of the human body into different groups based on the human body topology. The body joint features of different groups are then fused to predict the overall pose of the whole body, which requires a joint feature fusion scheme. Nevertheless, the joint feature fusion schemes adopted in existing methods involve the learning of extensive parameters and hence are computationally very expensive. This paper reports a new topology-based grouped method ‘EHFusion’ for 3D human pose estimation, which involves a heterogeneous feature fusion (HFF) module that integrates grouped pose features. The HFF module reduces the computational complexity of the model while achieving promising accuracy. Moreover, we introduce motion amplitude information and a camera intrinsic embedding module to provide better global information and 2D-to-3D conversion knowledge, thereby improving the overall robustness and accuracy of the method. In contrast to previous methods, the proposed new network can be trained end-to-end in one single stage. Experimental results not only demonstrate the advantageous trade-offs between estimation accuracy and computational complexity achieved by our method but also showcase the competitive performance in comparison with various existing state-of-the-art methods (e.g., transformer-based) when evaluated on two public datasets, Human3.6M and HumanEva. The data and code are available at doi:10.5281/zenodo.11113132
Keywords:	3D human pose estimation Efficient network Feature fusion Topology-based grouping strategy
Publisher:	Springer
Journal:	Visual computer
ISSN:	0178-2789
EISSN:	1432-2315
DOI:	10.1007/s00371-024-03724-5
Rights:	© The Author(s) 2024 This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The following publication Peng, J., Zhou, Y. & Mok, P.Y. EHFusion: an efficient heterogeneous fusion model for group-based 3D human pose estimation. Vis Comput 41, 5323–5345 (2025) is available at https://doi.org/10.1007/s00371-024-03724-5.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
s00371-024-03724-5.pdf		2.31 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show full item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM