Pseudo-LiDAR 3D detectors have made remarkable progress in monocular 3D detection by enhancing the capability of perceiving depth with depth estimation networks, and using LiDAR-based 3D detection architectures. The advanced stereo 3D detectors can also accurately localize 3D objects. The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation. Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods, including image-level generation, feature-level generation, and feature-clone, for detecting 3D objects from a single image. Our analysis of depth-aware learning shows that the depth loss is effective in only feature-level virtual view generation and the estimated depth map is effective in both image-level and feature-level in our framework. We propose a disparity-wise dynamic convolution with dynamic kernels sampled from the disparity feature map to filter the features adaptively from a single image for generating virtual image features, which eases the feature degradation caused by the depth estimation errors. Till submission (November 18, 2021), our Pseudo-Stereo 3D detection framework ranks 1st on car, pedestrian, and cyclist among the monocular 3D detectors with publications on the KITTI-3D benchmark. The code is released at https://github.com/revisitq/Pseudo-Stereo-3D. © 2022, CC BY-NC-SA.
Object detection, Object recognition, Stereo image processing, 3-D detectors, 3D object, Depth Estimation, Detection framework, Feature level, Objects detection, Pseudo stereos, Single images, View generation, Virtual view, Optical radar, Computer Vision and Pattern Recognition (cs.CV)
Y.N. Chen, H. Dai, and Y. Ding, "Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving", arXiv, Mar 2022, doi:10.48550/arXiv.2203.02112