参考文献 References
[1] Witkin A. Scale-space filtering: A new approach to multi-scale description[C]//ICASSP'84. IEEE international conference on acoustics, speech, and signal processing. IEEE, 1984, 9: 150-153.
[2] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893.
[3] Toshev, Alexander and Christian Szegedy. “DeepPose: Human Pose Estimation via Deep Neural Networks.”[J] 2014 IEEE Conference on Computer Vision and Pattern Recognition (2013): 1653-1660.
[4] Li J, Chen T, Shi R, et al. Localization with sampling-argmax[J]. Advances in Neural Information Processing Systems, 2021, 34: 27236-27248.
[5] Wei S E, Ramakrishna V, Kanade T, et al. Convolutional pose machines[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016: 4724-4732.
[6] Xu T, Takano W. Graph stacked hourglass networks for 3d human pose estimation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 16105-16114.
[7] Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 43(10): 3349-3364.
[8] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[9] Yu C, Xiao B, Gao C, et al. Lite-hrnet: A lightweight high-resolution network[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 10440-10450.
[10] Wang Y, Li M, Cai H, et al. Lite pose: Efficient architecture design for 2d human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 13126-13136.
[11] Wang J, Long X, Gao Y, et al. Graph-pcnn: Two stage human pose estimation with graph pose refinement[C]// Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer International Publishing, 2020: 492-508.
[12] Li Y, Zhang S, Wang Z, et al. Tokenpose: Learning keypoint tokens for human pose estimation[C]//Proceedings of the IEEE/CVF International conference on computer vision. 2021: 11313-11322.
[13] Li W, Liu M, Liu H, et al. Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 604-613.
[14] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
[15] Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. arXiv preprint arXiv:2405.14458, 2024.
[16] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016: 21-37.
[17] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]// Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.
[18] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.
[19] Fang H S, Li J, Tang H, et al. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(6): 7157-7173.
[20] Chen Y, Wang Z, Peng Y, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7103-7112.
[21] Cao Z, Simon T, Wei S E, et al. Realtime multi-person 2d pose estimation using part affinity fields[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7291-7299.
[22] Simonyan K. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[23] Osokin D. Real-time 2d multi-person pose estimation on cpu: Lightweight openpose[J]. arXiv preprint arXiv:1811.12004, 2018.
[24] Howard A G. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
[25] Maji D, Nagori S, Mathew M, et al. Yolo-pose: Enhancing yolo for multi person pose estimation using object keypoint similarity loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 2637-2646.
[26] Nguyen H C, Nguyen T H, Scherer R, et al. Unified end-to-end YOLOv5-HR-TCM framework for automatic 2D/3D human pose estimation for real-time applications[J]. Sensors, 2022, 22(14): 5419.
[27] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 7464-7475.
[28] 傅裕,高树辉.改进YOLOv8s-Pose多人姿态估计轻量化模型研究[J/OL].计算机科学与探索,1-17[2025-01-16]. http://kns.cnki.net/kcms/detail/11.5602.TP.20240507.1148.002.html.
[29] 方晓柯,黄俊.基于yolov8-pose的人体姿态检测模型[J/OL].激光杂志,1-9[2025-01-17].
http://kns.cnki.net/kcms/detail/50.1085.tn.20240902.1533.007.html.
[30] Zhu X, Hu H, Lin S, et al. Deformable convnets v2: More deformable, better results[C]//Proceedings of the IEEE/ CVF conference on computer vision and pattern recognition. 2019: 9308-9316.
[31] Doherty J, Gardiner B, Kerr E, et al. BiFPN-YOLO: One-stage object detection integrating Bi-Directional Feature Pyramid Networks[J]. Pattern Recognition, 2025, 160: 111209.
[32] 罗智杰,王泽宇,岑飘,等.基于改进YOLOv8pose的校园体测运动姿势识别研究[J].电子测量技术,2024,47(19): 24-33.
[33] Yu Z, Huang H, Chen W, et al. Yolo-facev2: A scale and occlusion aware face detector[J]. Pattern Recognition, 2024, 155: 110714.
[34] Wu T, Tang S, Zhang R, et al. A light-weight context guided network for semantic segmentation., 2020, 30[J]. DOI: https://doi. org/10.1109/TIP, 2020: 1169-1179.
[35] Yu H, Wan C, Liu M, et al. Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search[J]. arXiv preprint arXiv:2403.10413, 2024.
[36] Ionescu C, Papava D, Olaru V, et al. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(7): 1325-1339.
[37] Johnson S, Everingham M. Clustered pose and nonlinear appearance models for human pose estimation[C]//bmvc. 2010, 2(4): 5.
[38] Sapp B, Taskar B. Modec: Multimodal decomposable models for human pose estimation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 3674-3681.
[39] Wang J, Yang F, Gou W, et al. Freeman: Towards benchmarking 3d human pose estimation in the wild[J]. arXiv preprint arXiv:2309.05073, 2023.
[40] Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755.
[41] Lin W, Liu H, Liu S, et al. HiEve: A large-scale benchmark for human-centric video analysis in complex events[J]. International Journal of Computer Vision, 2023, 131(11): 2994-3018.
[42] Andriluka M, Pishchulin L, Gehler P, et al. 2d human pose estimation: New benchmark and state of the art analysis[C]//Proceedings of the IEEE Conference on computer Vision and Pattern Recognition. 2014: 3686-3693.
[43] Wu J, Zheng H, Zhao B, et al. Ai challenger: A large-scale dataset for going deeper in image understanding[J]. arXiv preprint arXiv:1711.06475, 2017.
[44] Li J, Wang C, Zhu H, et al. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 10863-10872.
[45] Andriluka M, Iqbal U, Insafutdinov E, et al. Posetrack: A benchmark for human pose estimation and tracking[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5167-5176.