Multi-resolution Fine-Tuning of Vision Transformers

Tools

Lists

Fitzgerald, Kerr, Law, Meng, Seah, Jarrel, Tang, Jennifer and Matuszewski, Bogdan ORCID: 0000-0001-7195-2509 (2022) Multi-resolution Fine-Tuning of Vision Transformers. In: Medical Image Understanding and Analysis, 27/7/2022-29/7/2022, Cambridge.

Full text not available from this repository.

Official URL: https://doi.org/10.1007/978-3-031-12053-4_40

Abstract

For computer vision systems based on artificial neural networks, increasing the resolution of images typically improves the performance of the network. However, ImageNet pre-trained Vision Transformer (ViT) models are typically only openly available for 2242 and 3842 image resolutions. To determine the impact of using higher resolution images with ViT systems the performance differences between ViT-B/16 models (designed for 3842 and 5442 image resolutions) were evaluated. The multi-label classification RANZCR CLiP challenge dataset, which contains over 30,000 high resolution labelled chest X-ray images, was used throughout this investigation. The performance of the ViT 3842 and ViT 5442 models with no ImageNet pre-training (i.e. models were only trained using RANZCR data) was firstly compared to see if using higher resolution images increases performance. After this, a multi-resolution fine-tuning approach was investigated for transfer learning. This approach was achieved by transferring learned parameters from ImageNet pre-trained ViT 3842 models, which had undergone further training on the 3842 RANZCR data, to ViT 5442 models which were then trained on the 5442 RANZCR data. Learned parameters were transferred via a tensor slice copying technique. The results obtained provide evidence that using larger image resolutions positively impacts ViT network performance and that multi-resolution fine-tuning can lead to performance gains. The multi-resolution fine-tuning approach used in this investigation could potentially improve the performance of other computer vision systems which use ViT based networks. The results of this investigation may also warrant the development of new ViT variants optimized to work with high resolution image datasets.

Repository Staff Only: item control page

Altmetric

Summary Table

Item Type:	Conference or Workshop Item Paper
Creators (Authors or editors):	Creators Email ORCID ORCID Put Code Fitzgerald, Kerr UNSPECIFIED UNSPECIFIED UNSPECIFIED Law, Meng UNSPECIFIED UNSPECIFIED UNSPECIFIED Seah, Jarrel UNSPECIFIED UNSPECIFIED UNSPECIFIED Tang, Jennifer UNSPECIFIED UNSPECIFIED UNSPECIFIED Matuszewski, Bogdan bmatuszewski1@uclan.ac.uk https://orcid.org/0000-0001-7195-2509 UNSPECIFIED
Uncontrolled Keywords (separate with ;):	Computer vision; Vision transformer; ViT; Fine-tuning; Transfer learning; Medical data; RANZCR CLiP
Subjects:	B - Subjects allied to medicine > B800 - Medical technology
Schools:	School of Engineering and Computing > Computing School of Engineering and Computing > Engineering, Construction, Maths and Physics
Research Institutes:	Institute for Engineering & Technology Innovation (InETI)
ID Code:	43337
Depositing User ID:	Victoria Le Quelenec
Date Deposited:	13 Sep 2022 15:09
Last Modified:	01 Aug 2024 12:22

CORE (COnnecting REpositories)

Search CLok

Multi-resolution Fine-Tuning of Vision Transformers

Abstract

Follow Us