[3D pose estimation task]

3D pose estimation aims to reproduce the 3D coordinates of a person appearing in an untrimmed 2D video. It has been extensively studied in literature with many real-world applications, e.g., sports [1], healthcare [2], games [3], movies [4], and video compression. Instead of fully rendering 3D voxels, this work narrows down the scope of discussion to reconstructing a handful number of body key-points (e.g., neck, knees, or ankles), which concisely represent dynamics of human motions in the real world.


3D pose estimation for multi-person (3DMPPE) from a monocular video is particularly challenging and is still largely uncharted, far from applying to in-the-wild scenarios yet. They pose three unresolved issues with the existing methods: lack of robustness on unseen views during training, vulnerability to occlusion, and severe jittering in the output.

[Proposed Method]

The proposed method, POTR-3D, realizes a seq2seq 2D-to-3D lifting model for 3DMPPE for the first time, and devises a simple but effective data augmentation strategy, allowing to generate an unlimited number of occlusion-aware augmented data with diverse views. Putting them together, the overall methodology effectively tackles the aforementioned three challenges in 3DMPPE and adapts well to in-the-wild videos.

Sungchan Park, Eunyi Lyou, Inhoe Lee, Joonseok Lee.

Proceedings of the 19th IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

  1. Lewis Bridgeman, Marco Volino, Jean-Yves Guillemaut, and Adrian Hilton. Multi-person 3D pose estimation and tracking in sports. In CVPR Workshops, 2019.
  2. Qingqiang Wu, Guanghua Xu, Sicong Zhang, Yu Li, and Fan Wei. Human 3D pose estimation in a lying position by rgb-d images for medical diagnosis and rehabilitation. In Proc. of the Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2022.
  3. Hian-Ru Ke, LiangJia Zhu, Jenq-Neng Hwang, Hung-I Pai, Kung-Ming Lan, and Chih-Pin Liao. Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming.
  4. Karteek Alahari, Guillaume Seguin, Josef Sivic, and Ivan Laptev. Pose estimation and segmentation of people in 3D movies. In ICCV, 2013.
  5. Ing-Chun Wang, Arun Mallya, and Ming-Yu Liu. One-shot free-view neural talking-head synthesis for video conferencing. arXiv:2011.15126, 2020.