Research on Vehicle Recognition Based on Unpacking 3D Bounding Boxes of Monocular Camera in Traffic Scene 2020-01-5196
Currently, most of vehicle recognition methods are realized by deep convolutional neural networks (DCNNs) with input of images directly as training data. Due to the factor of perspective distortion and scale change of images taken by monocular camera, a large number of multi-scale images need to be used for training, and physical information of vehicles cannot be obtained at the same time. In order to improve the above problems, we present a method of vehicle recognition based on unpacking 3D bounding boxes in this paper. Firstly, camera calibration information and geometric constraints are used to build 3D bounding boxes around vehicles in monocular projection. Then, the 3D bounding boxes are unpacked to obtain 3D normalized spatial data without perspective distortion. Finally, VGG-16 is chosen as the backbone of our network, the output of which can be divided into five common vehicle types including hatchback, sedan, SUV, truck and bus. The experimental results indicate that the accuracy of our method is improved by 8.74% for hatchback and 7.49% for sedan with less training data, which outperforms traditional end-to-end deep learning methods of vehicle recognition and physical information can be obtained simultaneously.
Citation: Wang, W., Tang, X., Tian, S., Zhang, C. et al., "Research on Vehicle Recognition Based on Unpacking 3D Bounding Boxes of Monocular Camera in Traffic Scene," SAE Technical Paper 2020-01-5196, 2020, https://doi.org/10.4271/2020-01-5196. Download Citation