2133

Prompt guided multi-organ segmentation of the total body

Meiyuan Wen¹, Yunlong Gao¹, Yaping Wu², Zhenxing Huang¹, Wenbo Li¹, Wenjie Zhao¹, Yongfeng Yang¹, Hairong Zheng¹, Dong Liang¹, Meiyun Wang², and Zhanli Hu¹
¹Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, CAS, Shenzhen, China, ²Department of Medical Imaging, Henan Provincial People's Hospital & People's Hospital of Zhengzhou University, Zhengzhou, China

Synopsis

Keywords: Segmentation, Whole Body

Motivation: Numerous studies have made significant strides in the field of medical image segmentation. However, most studies have focused on specific localized regions rather than addressing the challenge of unified segmentation across the entire human body.

Goal(s): To improve the efficiency and accuracy of disease diagnosis and treatment, continuous advancements in multi-organ segmentation brings great advantages.

Approach: In this paper, we present a prompt guided multi-organ segmentation model on total-body images, which can be adapted for CT, PET and MRI modalities.

Results: Our extensive experiments demonstrate the superior performance of our model in accurately segmenting 21 organs.

Impact: Our research leverages the power of prompts to tackle the challenge of multi-organ segmentation. It has potentially wide applications in the fields of CT, MRI and PET, enabling the simultaneous segmentation of multiple organs and images from diverse modalities.

Introduction

Multi-organ segmentation is a pivotal technique that combines holistic information, effectively delineating diverse tissues and anatomical structures. This leads to enhanced disease diagnosis precision and the elucidation of comprehensive treatment objectives[1]. One of the most popular segmentation models is UNet[2]; however, it often exhibits limitations in capturing long-range dependency features. Consequently, a series of studies have been dedicated to designing transformer-based UNet variants[3-6], known for their global information extraction capabilities. Within the realm of image segmentation, many researchers have explored leveraging prompts to augment model performance[7-9]. In our pursuit of extending segmentation across the entire human body, we introduce the Prompt-Guided Multi-Organ Segmentation model (PGMOSeg), which integrates prompt-based learning with the capabilities of TransUNet[6].

Methodology

Datasets
This dataset was provided by Henan Provincial People's Hospital. Total-body PET scans (uEXPLORE, United Imaging Healthcare, Shanghai) were performed from head-to-toe in a single bed. This dataset contains 110 patient cases, including carefully annotated data for 21 different anatomical structures, covering vital organs such as the adrenal gland, aorta, pelvic bone, clavicle, esophagus, femoral head, heart, humeral head, thyroid, kidney, liver, lungs, pancreas, rectum, ribs, scapula, spinal canal, spinal cord, spleen, sternum and vertebrae.
Evaluation Metrics
To quantitatively evaluate the model performance, we used four commonly adopted segmentation metrics[10]. These four evaluation metrics are Jaccard, Recall, Dice, and Hausdorff Distance (HD). Higher Jaccard, Dice, and Recall mean that the segmentation results are more similar to the ground truth, while lower HD means that the segmentation results are less different from the ground truth.
Model Implementation
The overall architecture of our model PGMOSeg is shown in Figure 1. It comprises three primary components: an encoder, feature enhancer, and decoder, and the workflow is as follows: First, the total-body image is input into the encoder to initiate the extraction of image features. Subsequently, the image features, along with the prompt extracted by the encoder, are passed to the feature enhancer. The feature enhancer employs multiple attention modules to process both the image and prompt features. Finally, these processed features are forwarded to the prompt-guided decoder. Within the prompt-guided decoder, the prompt-attention layer combines the prompt information with the image information to fuse new features. This process reinforces the mapping relationship between the image features and the target segmentation organs.

Experimental Results

We train several models, including UNet, Attention UNet with VGG (Att UNet-v)[11], Attention UNet with ResNet (Att UNet-r), UNet++ [12], and TransUNet, in comparison to our model. As seen from Figure 2, our model shows outstanding performance in this complex multi-organ segmentation task, and the results are also similar to the ground truth. It is worth mentioning that our model avoids the problem of indistinguishable organs during segmentation by using prompts to guide the training. Furthermore, the quantitative evaluation metrics can better compare and quantify the performance of different models. From Figure 3, it is evident that our model achieves remarkable results of 21-organ segmentation in all four evaluation metrics with higher Jaccard, Dice and Recall and lower HD. To verify the model performance under different numbers of segmented organs, we selected 13 easily recognizable and large organs out of the 21 organs, and within these, we selected 6 more important organs that are commonly examined by clinicians. Experiments were then performed on each of these three sets of segmentation tasks for different numbers of organs. Compared with other models, it can be seen in Table 1 that our model PGMOSeg not only performs well on 21-organ segmentation but also has some advantages on 13-organ and 6-organ segmentation. Subsequently, we provide an ablation study, as shown in Table 2, to explain the effectiveness of each component in our model. The performance of our model drops significantly when prompt-assisted decoders or any other related modules are removed, thus confirming the necessity of each module.

Discussion and Conclusion

This research introduces an innovative prompt-guided multiorgan segmentation method. Comparative experiments and robustness analysis across 21-organ, 13-organ, and 6-organ segmentation tasks validate our model's pronounced advantages, particularly in scenarios with more organs and smaller targets. We also delved into the impact of each key component in our model through an ablation study to validate the effectiveness of this model. In the next step of our research, we will concentrate on optimizing the model for increased efficiency and broader application, including low-dose CT and 5T MRI.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (82372038 and 62101540), the Shenzhen Excellent Technological Innovation Talent Training Project of China (RCJC20200714114436080), the Key Laboratory for Magnetic Resonance and Multimodality Imaging of Guangdong Province (2023B1212060052) and the Shenzhen Science and Technology Program (JCYJ20220818101804009 and RCBS20210706092218043).

References

[1] P.-H. Conze, A. E. Kavur, E. Cornec-Le Gall, N. S. Gezer, Y. Le Meur, M. A. Selver, and F. Rousseau, “Abdominal multi-organ segmentation with cascaded convolutional and adversarial deep networks,” Artificial Intelligence in Medicine, vol. 117, p. 102109, 2021.

[2] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.

[3] B. Chen, Y. Liu, Z. Zhang, G. Lu, and A. W. K. Kong, “Transattunet: Multi-level attention-guided u-net with transformer for medical imagesegmentation,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2023.

[4] R. Azad, M. T. Al-Antary, M. Heidari, and D. Merhof, “Transnorm: Transformer provides a strong spatial normalization mechanism for adeep segmentation model,” IEEE Access, vol. 10, pp. 108 205–108 215, 2022.

[5] Y. Ji, R. Zhang, H. Wang, Z. Li, L. Wu, S. Zhang, and P. Luo, “Multi-compound transformer for accurate biomedical image segmentation,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer, 2021, pp. 326–336.

[6] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021.

[7] T. L ̈uddecke and A. Ecker, “Image segmentation using text and image prompts,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7086–7096.

[8] M. Arsalan, T. M. Khan, S. S. Naqvi, M. Nawaz, and I. Razzak, “Prompt deep light-weight vessel segmentation network (plvs-net),” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 20, no. 2, pp. 1363–1371, 2022.

[9] Z. Huang, X. Liu, R. Wang, and et al., “Learning a deep cnn denoising approach using anatomical prior information implemented with attention mechanism for low-dose ct imaging on clinical patient data from multiple anatomical sites,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 9, pp. 3416–3427, 2021.

[10] D. M ̈uller, I. Soto-Rey, and F. Kramer, “Towards a guideline for evaluation metrics in medical image segmentation,” BMC Research Notes, vol. 15, no. 1, pp. 1–8, 2022.

[11] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al., “Atten-tion u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.

[12] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in imagesegmentation,” IEEE transactions on medical imaging, vol. 39, no. 6, pp. 1856–1867, 2019.

Figures

Schematic view of PGMOSeg. (a) Overall network architecture. (b) Details of the prompt-guided feature enhancer (PGFE) module. (c) Workflow of the prompt-guided decoder module.

Results of 21-organ segmentation in horizontal planes by different models.

Line chart of evaluation metrics changes for 21-organ segmentation.

Results of different models in 6-organ, 13-organ, and 21-organ segmentation.

Results of the ablation study (-D stands for dropping prompt-guided decoder, -E stands for dropping PGFE).

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

2133

DOI: https://doi.org/10.58530/2024/2133