PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators

[Potential of Multi-accelerator Edge Devices]

Running deep learning models on resource-constrained devices has drawn significant attention due to its fast response, privacy preservation, and robust operation regardless of Internet connectivity [1, 2, 3]. While these devices already cope with various intelligent tasks, the latest edge devices that are equipped with multiple types of low-power accelerators (i.e., both mobile GPU and NPU) can bring another opportunity; 3D object detection that used to be too heavy for an edge device in the single-accelerator world might become viable in the upcoming heterogeneous-accelerator world.

[Challenges]

Even with the latest edge devices containing both GPU and NPU, enabling on-device 3D object detection without sacrificing accuracy is challenging. (1) 3D object detection is typically designed as a sequential process [4], making it hard to utilize GPU and NPU in parallel. (2) Since GPU and NPU have different strengths, a 3D object detection model should be analyzed thoroughly to distribute its computation to the two processors synergistically. (3) Fusing 2D vision information with a 3D point cloud can improve detection performance [5, 6, 7] but makes the computational burden even heavier on the edge devices. (4) Quantization is necessary to reduce computation as well as to utilize NPU but given that 3D object detection is a sophisticated task, a naïve approach would significantly degrade the accuracy.

[Proposal]

To tackle the issues, Prof. Kim’s team proposed PointSplit, which includes 3 following main components: (1) 2D-semantics aware biased farthest point sampling, (2) Parallelized 3D feature extraction and (3) Role-based groupwise quantization. The team implemented PointSplit on a test resource-constrained platform by combining NVIDIA Jetson Nano (including mobile GPU) and Google EdgeTPU (an NPU type). Extensive experiments on the test platform verify the effectiveness of PointSplit in terms of both accuracy and latency.

Figure. PointSplit’s hardware platform, parallelized pipeline, and role-based quantization

Keondo Park, You Rim Choi, Inhoe Lee, and Hyung-Sin Kim.

IPSN 2023 (ACM/IEEE International Conference on Information Processing in Sensor Networks).

https://dl.acm.org/doi/abs/10.1145/3583120.3587045

References

Kittipat Apicharttrisorn et al. 2019. Frugal following: Power thrifty object detection and tracking for mobile augmented reality. In Proceedings of the 17th ACM Conference on Embedded Networked Sensor Systems. 96–109
Yuxuan Cai et al. 2021. YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design. Proceedings of the AAAI Conference on Artificial Intelligence 35, 2 (May 2021), 955–963.
Kaifei Chen et al. 2018. Marvel: Enabling mobile augmented reality with low energy and low latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 292–304.
Charles R Qi et al. 2019. Deep hough voting for 3d object detection in point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 9277–9286.
Jintai Chen et al. 2020. A Hierarchical Graph Network for 3D Object Detection on Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Charles Qi et al. 2020. Imvotenet: Boosting 3d object detection in point clouds with image votes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Charles R Qi et al. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)