Journal "Software Engineering" (Programmnaya Ingeneria) | 2D-to-3D Projection for Monocular and Multi-View 3D Object Detection in Outdoor Scenes

Main

New Issue

Archive

Most cited articles

Editor in chief

Editorial board

For the authors

Publishing ethic

Peer reviewing

Publishing House

Old site

Russian

Issue N7 2021 year

DOI: 10.17587/prin.12.373-384

2D-to-3D Projection for Monocular and Multi-View 3D Object Detection in Outdoor Scenes

D. D. Rukhovich, daniel-rukhovich@yandex.ru, Faculty of Mechanics and Mathematics, Moscow State University, Moscow, 119234, Russian Federation

Corresponding author: Rukhovich Danila D., Postgraduate Student, Faculty of Mechanics and Mathematics, Moscow State University, Moscow, 119234, Russian Federation, E-mail: daniel-rukhovich@yandex.ru

Received on June 16, 2021

Accepted on July 07, 2021

In this article, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. In a multi-view formulation of the 3D object detection problem, several images of a static scene are used to detect objects in the scene. To address the 3D object detection problem in a multi-view formulation, we propose a novel 3D object detection method named ImVoxelNet. ImVoxelNet is based on a fully convolutional neural network. Unlike existing 3D object detection methods, ImVoxelNet works directly with 3D representations and does not mediate 3D object detection through 2D object detection. The proposed method accepts multi-view inputs. The number of monocular images in each multi-view input can vary during training and inference; actually, this number might be unique for each multi-view input. Moreover, we propose to treat a single RGB image as a special case of a multi-view input. Accordingly, the proposed method can also accept monocular inputs with no modifications. Through extensive evaluation, we demonstrate that the proposed method successfully handles a variety of outdoor scenes. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. The proposed method operates in real-time, which makes it possible to integrate it into the navigation systems of autonomous devices. The results of this study can be used to address tasks of navigation, path planning, and semantic scene mapping.

Keywords: machine learning, deep learning, 3D object detection

pp. 373–384

For citation:
Rukhovich D. D. 2D-to-3D Projection for Monocular and Multi-View 3D Object Detection in Outdoor Scenes, Programmnaya Ingeneria, 2017, vol. 12, no. 7, pp. 373—384.