Research |

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation

Tools

Lists

Zhao, Jianyu ORCID: 0000-0002-1531-8658, Quan, Wei ORCID: 0000-0003-2099-9520 and Matuszewski, Bogdan ORCID: 0000-0001-7195-2509 (2024) CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation. In: 35th British Machine Vision Conference 2024, 25-28 November 2024, Glasgow, Scotland, United Kingdom.

Preview	PDF (AAM) - Accepted Version Available under License Creative Commons Attribution. 2MB
Preview	PDF - Supplemental Material 3MB

Official URL: https://bmvc2024.org/proceedings/967/

Abstract

Estimating rigid objects' poses is one of the fundamental problems in computer vision, with a range of applications across automation and augmented reality. Most existing approaches adopt one network per object class strategy, depend heavily on objects' 3D models, depth data, and employ a time-consuming iterative refinement, which could be impractical for some applications. This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation that addresses these limitations. The CVAM-Pose method employs a label-embedded conditional variational autoencoder network, to implicitly abstract regularised representations of multiple objects in a single low-dimensional latent space. This autoencoding process uses only images captured by a projective camera and is robust to objects' occlusion and scene clutter. The classes of objects are one-hot encoded and embedded throughout the network. The proposed label-embedded pose regression strategy interprets the learnt latent space representations utilising continuous pose representations. Ablation tests and systematic evaluations demonstrate the scalability and efficiency of the CVAM-Pose method for multi-object scenarios. The proposed CVAM-Pose outperforms competing latent space approaches. For example, it is respectively 25% and 20% better than AAE and Multi-Path methods, when evaluated using the ARVSD metric on the Linemod-Occluded dataset. It also achieves results somewhat comparable to methods reliant on 3D models reported in BOP challenges.

Repository Staff Only: item control page

Altmetric

Summary Table

Item Type:	Conference or Workshop Item Paper
Additional Information:	According to Arxiv comments, this will be presented as an oral presentation, though there is a full paper that has also been submitted on Arxiv, so conference item type marked as 'Paper'.
Creators (Authors or editors):	Creators Email ORCID ORCID Put Code Zhao, Jianyu jzhao12@uclan.ac.uk https://orcid.org/0000-0002-1531-8658 UNSPECIFIED Quan, Wei wquan@uclan.ac.uk https://orcid.org/0000-0003-2099-9520 UNSPECIFIED Matuszewski, Bogdan bmatuszewski1@uclan.ac.uk https://orcid.org/0000-0001-7195-2509 UNSPECIFIED
Subjects:	I - Computer science > I440 - Computer vision
Schools:	School of Engineering and Computing > Engineering, Construction, Maths and Physics
Research Institutes:	Institute for Engineering & Technology Innovation (InETI)
Related URLs:	https://uclandata.uclan.ac.uk/472
ID Code:	53319
Depositing User ID:	Christopher Waddington
Date Deposited:	18 Oct 2024 15:16
Last Modified:	09 Jan 2025 11:25

CORE (COnnecting REpositories)

Search CLok

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation

Abstract

Follow Us