Scene disparity estimation with convolutional neural networks

Anas, Essa orcid iconORCID: 0000-0002-2932-5867, Guo, Li orcid iconORCID: 0000-0003-1272-8480, Onsy, Ahmed orcid iconORCID: 0000-0003-0803-5374 and Matuszewski, Bogdan orcid iconORCID: 0000-0001-7195-2509 (2019) Scene disparity estimation with convolutional neural networks. SPIE Proceedings Multimodal Sensing: Technologies and Applications, 11059 . 110590T1-110590T9. ISSN 0277-786X

[thumbnail of Version of Record]
Preview
PDF (Version of Record) - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

480kB

Official URL: https://doi.org/10.1117/12.2527628

Abstract

Estimation of stereovision disparity maps is important for many applications that require information about objects’ position and geometry. For example, as depth surrogate, disparity maps are essential for objects’ 3D shape reconstruction and indeed other applications that do require three dimensional representation of a scene. Recently, deep learning (DL) methodology has enabled novel approaches for the disparity estimation with some focus on the real-time processing requirement that is critical for applications in robotics and autonomous navigation. Previously, that constraint was not always addressed. Furthermore, for robust disparity estimation the occlusion effects should be explicitly modelled. In the described method the effective detection of occlusion regions is achieved through disparity estimation in both, forward and backward correspondence model with two matching deep networks. These two subnetworks are trained jointly in a single training process. Initially the subnetworks are trained using simulated data with the know ground truth, then to improve generalisation properties the whole model is fine-tuned in an unsupervised fashion on real data. During the unsupervised training the model is equipped with bilinear interpolation warping function to directly measure quality of the correspondence with the disparity maps estimated for both the left and right image. Also, during this phase forward-backward consistency constraint loss function is applied to regularise the disparity estimators for non-occluding pixels. The described network model computes, at the same time, the forward and backward disparity maps as well as corresponding occlusion masks. It showed improved results on simulated and real images with occluded objects, when compared with the results obtained without using the forward-backward consistency constraint loss function.


Repository Staff Only: item control page