[3D pose estimation task]
3D pose estimation aims to reproduce the 3D coordinates of a person appearing in an untrimmed 2D video. It has been extensively studied in literature with many real-world applications, e.g., sports [1], healthcare [2], games [3], movies [4], and video compression. Instead of fully rendering 3D voxels, this work narrows down the scope of discussion to reconstructing a handful number of body key-points (e.g., neck, knees, or ankles), which concisely represent dynamics of human motions in the real world.
[Challenges]
3D pose estimation for multi-person (3DMPPE) from a monocular video is particularly challenging and is still largely uncharted, far from applying to in-the-wild scenarios yet. They pose three unresolved issues with the existing methods: lack of robustness on unseen views during training, vulnerability to occlusion, and severe jittering in the output.
[Proposed Method]
The proposed method, POTR-3D, realizes a seq2seq 2D-to-3D lifting model for 3DMPPE for the first time, and devises a simple but effective data augmentation strategy, allowing to generate an unlimited number of occlusion-aware augmented data with diverse views. Putting them together, the overall methodology effectively tackles the aforementioned three challenges in 3DMPPE and adapts well to in-the-wild videos.


Sungchan Park, Eunyi Lyou, Inhoe Lee, Joonseok Lee.
Proceedings of the 19th IEEE/CVF International Conference on Computer Vision (ICCV), 2023.