Research |

CVML-Pose: Convolutional VAE based Multi-Level Network for Object 3D Pose Estimation

Lists

Tools

Zhao, Jianyu ORCID: 0000-0002-1531-8658, Sanderson, Edward ORCID: 0000-0002-3794-5513 and Matuszewski, Bogdan ORCID: 0000-0001-7195-2509 (2023) CVML-Pose: Convolutional VAE based Multi-Level Network for Object 3D Pose Estimation. IEEE Access .

Preview

PDF (VOR) - Published Version
Available under License Creative Commons Attribution.
2MB

Official URL: https://doi.org/10.1109/ACCESS.2023.3243551

Abstract

Most vision-based 3D pose estimation approaches typically rely on knowledge of object’s 3D model, depth measurements, and often require time-consuming iterative refinement to improve accuracy. However, these can be seen as limiting factors for broader real-life applications. The main motivation for this paper is to address these limitations. To solve this, a novel Convolutional Variational Auto-Encoder based Multi-Level Network for object 3D pose estimation (CVML-Pose) method is proposed. Unlike most other methods, the proposed CVML-Pose implicitly learns an object’s 3D pose from only RGB images encoded in its latent space without knowing the object’s 3D model, depth information, or performing a post-refinement. CVML-Pose consists of two main modules: (i) CVML-AE representing convolutional variational autoencoder, whose role is to extract features from RGB images, (ii) Multi-Layer Perceptron and K-Nearest Neighbor regressors mapping the latent variables to object 3D pose including, respectively, rotation and translation. The proposed CVML-Pose has been evaluated on the LineMod and LineMod-Occlusion benchmark datasets. It has been shown to outperform other methods based on latent representations and achieves comparable results to the state-of-the-art, but without use of a 3D model or depth measurements. Utilizing the t-Distributed Stochastic Neighbor Embedding algorithm, the CVML-Pose latent space is shown to successfully represent objects’ category and topology. This opens up a prospect of integrated estimation of pose and other attributes (possibly also including surface finish or shape variations), which, with real-time processing due to the absence of iterative refinement, can facilitate various robotic applications.

Repository Staff Only: item control page

Altmetric

Summary Table

Item Type:	Article
Creators (Authors or editors):	Creators Email ORCID ORCID Put Code Zhao, Jianyu jzhao12@uclan.ac.uk https://orcid.org/0000-0002-1531-8658 UNSPECIFIED Sanderson, Edward esanderson4@uclan.ac.uk https://orcid.org/0000-0002-3794-5513 UNSPECIFIED Matuszewski, Bogdan bmatuszewski1@uclan.ac.uk https://orcid.org/0000-0001-7195-2509 UNSPECIFIED
Uncontrolled Keywords (separate with ;):	3D pose estimation; Deep learning; Variational autoencoder; Synthetic data
Subjects:	H - Engineering > H990 - Engineering not elsewhere classified
Schools:	School of Engineering and Computing > Computing School of Engineering and Computing > Engineering, Construction, Maths and Physics
Research Institutes:	Institute for Engineering & Technology Innovation (InETI)
Funders:	Name ID Engineering and Physical Sciences Research Council http://dx.doi.org/10.13039/501100000266
Projects:	Name ID Budget Code URL UNSPECIFIED EP/K019368/1 UNSPECIFIED UNSPECIFIED
ID Code:	45533
Depositing User ID:	Paul Harrison
Date Deposited:	08 Feb 2023 08:56
Last Modified:	16 Jun 2025 20:00

CORE (COnnecting REpositories)

Search CLok

CVML-Pose: Convolutional VAE based Multi-Level Network for Object 3D Pose Estimation

Abstract

Follow Us