Impact of attention mechanisms for organ segmentation in chest x-ray images over U-Net model

Tomás de la Sotta*, Violeta Chang*, Benjamín Pizarro*, Héctor Henriquez*, Nicolás Alvear*, Jose M. Saavedra*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Chest x-ray images are one of the most commonly performed imaging tests, providing crucial clinical information about structures like the heart, lungs, ribs, bones, and blood vessels. In this context, image segmentation is a critical stage as it aims to separate significant parts of an image. However, manual image segmentation presents serious difficulties that can be tackled using deep learning-based methods, such as U-Net. Furthermore, attention mechanisms have attracted the interest of the machine-learning community and can bring improvements in medical image segmentation. This research aims to assess the impact of attention-based mechanisms for segmenting different organs in chest X-ray images, like heart, lungs, clavicles and ribs, over the well-known U-Net architecture as a baseline. We study five U-Net encoder variations, replacing the U-Net encoder with ResNet 18, 34 and 50, Swin Transformer and a simple residual structure comprised of a single ResNet-50 prior to each U-Net encoder layer. In the original U-Net, the skip layers are identity layers connecting each encoder block with its corresponding decoder one. Here, we replace these layers with different attention mechanisms: spatial attention, full spatial attention, double spatial attention, multiple spatial attention, spatial cross-attention and Swin spatial cross-attention. Each encoder variation and attention mechanism was evaluated from scratch and on a pre-trained scenario, independently for lungs, ribs, heart and clavicles. ResNet-UNet-18 achieves up to 0.96 and 0.53 of average overlapping with hand-segmented masks of lungs and clavicles, respectively. The best encoder for rib and heart segmentation was Residual U-Net, with 0.88 and 0.83 overlapping, respectively. Furthermore, for attention mechanisms, the most suitable were selected according to overlapping with hand-segmented masks as Spatial Attention U-Net for lungs (0.96), Three-Head Attention for ribs (0.88), Full Spatial Attention for the heart (0.82) and Spatial Cross-Attention for clavicles (0.54). Encoder variations and attention mechanism have a positive impact over a U-Net for segmentation of lungs, ribs, heart and clavicles from scratch, without transfer learning. Moreover, the encoder and the attention mechanism are not universal for segmenting different organs in chest X-ray images. While organs share the same backbone architecture (lungs and clavicles), the most appropriate attention mechanisms for each organ are all different, achieving up to 0.99 of overlapping in lung segmentation.

Original languageEnglish
Pages (from-to)1-23
Number of pages23
JournalMultimedia Tools and Applications
DOIs
StatePublished - 31 Oct 2023

Bibliographical note

Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Keywords

  • Attention mechanism
  • Chest x-ray segmentation
  • Encoder variation
  • U-Net

Cite this