Deep Models for Rigid Objects Real-Time Pose Estimation

Zhao, Jianyu orcid iconORCID: 0000-0002-1531-8658 (2024) Deep Models for Rigid Objects Real-Time Pose Estimation. Doctoral thesis, University of Central Lancashire.

[thumbnail of Thesis]
Preview
PDF (Thesis) - Submitted Version
Available under License Creative Commons Attribution Non-commercial.

67MB

Digital ID: http://doi.org/10.17030/uclan.thesis.00053726

Abstract

Accurate and robust six degrees of freedom (6-DoF) pose estimation of rigid objects is one of the fundamental tasks in computer vision, with wide-ranging applications that span industrial automation, augmented reality, and medical intervention. However, most existing methods typically rely on knowledge of objects’ 3D models and depth measurements, and often require time-consuming iterative refinement to improve accuracy, which can be seen as limiting factors for broader applications.

This PhD thesis is primarily motivated by the desire to overcome these limitations. It presents a comprehensive study of the 6-DoF pose estimation problem. Drawing inspiration from the latest deep learning pose estimation methods, a novel 6-DoF pose estimation framework named Auto-Pose is proposed, which incorporates latent space representations of deep neural networks with supervised learning algorithms. The proposed framework consists of three novel autoencoder-based methods: DALSR-Pose, CVML-Pose, and CVAM-Pose. These proposed methods are specifically designed to address the limitations of the existing methods enabling the estimation of rigid objects’ 6-DoF poses from a single colour image in real time, without access to any explicit 3D models of the objects or depth data or performing a post-refinement.

The fundamental idea is to implicitly learn intermediate representations of objects in the latent space from only colour images, and the 6-DoF poses are estimated from the latent representations using multiple regression-based algorithms such as multilayer perception (MLP), k-nearest neighbours (KNN), and random forest (RF). Deep Models for Rigid Objects Real-Time Pose Estimation. The proposed methods can operate in real time and are applicable in complex scenarios, including textured/texture-less objects represented in low-resolution images with heavy occlusion and clutter.

Extensive experiments and evaluation results across multiple publicly available benchmark datasets demonstrate the superiority of the proposed framework in pose estimation accuracy over existing methods that similarly use latent space representations, with accuracy improved by 30%. It also achieves comparable results to other state-of-the-art methods that use 3D models.

The thesis makes significant contributions to the field of 6-DoF pose estimation facilitating development of model-free estimation algorithms. The novelty of the work rests in the proposed autoencoder-based methods that achieve competitive performance compared to the state-of-the-art using only data from a monoscopic camera, without the need for the object’s 3D model, depth measurement, or further iterative refinement often essential for the existing methods.


Repository Staff Only: item control page