Polyp Segmentation Generalisability of Pretrained Backbones

Sanderson, Edward orcid iconORCID: 0000-0002-3794-5513 and Matuszewski, Bogdan orcid iconORCID: 0000-0001-7195-2509 (2024) Polyp Segmentation Generalisability of Pretrained Backbones. In: Medical Image Understanding and Analysis. Frontiers Media, pp. 224-231. ISBN 978-2-8325-1244-9

[thumbnail of VOR]
Preview
PDF (VOR) - Published Version
Available under License Creative Commons Attribution.

397kB

Official URL: https://doi.org/10.3389/978-2-8325-1244-9

Abstract

Due to the low availability of annotated data for training polyp segmentation models, e.g. Sanderson and Matuszewski (2022), which typically take the form of an autoencoder with UNet-style skip connections (Ronneberger et al., 2015), it is common practice to pretrain the encoder, also known as the backbone. This has almost exclusively been done in a supervised manner with ImageNet-1k (Deng et al., 2009). However, we recently demonstrated that pretraining backbones in a self-supervised manner generally provides better fine-tuned performance, and that models with ViT-B (Dosovitskiy et al., 2020) backbones typically perform better than models with ResNet50 (He et al., 2016) backbones (Sanderson and Matuszewski, 2024).

In this paper, we extend this work to consider generalisability. I.e., we assess performance on a different dataset to that used for fine-tuning, accounting for variation in network architecture and pretraining pipeline (algorithm and dataset). This reveals how well models generalise to a somewhat different distribution to the training data, which arise in deployment as a result of different cameras, demographics of patients, and other factors. Our results provide further insights into the strengths and weaknesses of existing architectures and pretraining pipelines that should inform the future development of polyp segmentation models.


Repository Staff Only: item control page