0725

3D Kidney Segmentation in MRI using Transformers
Kanishka Sharma1,2, Kywe Kywe Soe2, Joao Periquito2, Francesco Santini2,3, Bashair Alhummiany4, David Shelley4, Andrew Forbes Brown5, Jonathan Fulford5, Mark Gilchrist5, Angela Shore5, Bixente Dilharreguy6, Nicolas Grenier6, Maria F. Gomez7, Kim Gooding5, and Steven Sourbron2
1Antaros Medical AB, Mölndal, Sweden, 2The University of Sheffield, Sheffield, United Kingdom, 3Basel Muscle MRI, Department of Biomedical Engineering, University of Basel, Basel, Switzerland, 4University of Leeds, Leeds, United Kingdom, 5University of Exeter, Exeter, United Kingdom, 6University of Bordeaux, Bordeaux, France, 7Department of Clinical Sciences in Malmö, Lund University Diabetes Centre, Malmö, Sweden

Synopsis

Keywords: Kidney, Kidney, Segmentation, TKV, Transformers

Motivation: Convolutional Neural Networks (CNNs) have long been the go-to deep-learning architecture for medical image segmentation, but in recent years transformer-based architectures adapted from large language models are setting a new standard.

Goal(s): The aim of this study was to test if transformers are suitable for 3D kidney segmentation on high-resolution MRI.

Approach: A transformer-based deep-learning architecture (UNETR) was trained and tested against a supervised method on 82 patient datasets from the iBEAt study on diabetic kidney disease.

Results: UNETR provides fast segmentation with comparable results to the supervised method, but additional refinement is needed to reduce the limits of agreement.

Impact: Novel transformer-based architectures for medical image segmentation may be useful for fast 3D segmentation of individual kidneys.

INTRODUCTION

Convolutional Neural Networks (CNN) and U-Nets have long been the go-to architecture for segmentation of medical images1-5 but in recent years transformer-based architectures adapted from large language models are increasingly setting the standard6. In contrast to CNN’s, Transformers model long-range dependencies and thus capture more global contextual information7. They have been applied successfully for kidney segmentation on CT but have not yet been evaluated in MRI.

The aim of this work is to test whether a transformer-based architecture may be a suitable approach for 3D single-kidney segmentation on T1-weighted MRI.

METHODS

MRI Acquisition, Data Annotation: 82 patients with available and twice reviewed kidney segmentation masks were selected from the 3 sites with the same vendor in the ongoing iBEAt study (www.beat-dkd.eu/) on diabetic kidney disease (DKD)8. Multiparametric MRI was performed on a MAGNETOM Prisma 3T (Siemens Healthcare GmbH, Erlangen, Germany). Whole kidney segmentation was performed on a coronal, single breath-hold, 3D-Dixon of the abdomen (FOV: 400x400mm2; TR = 4.01ms; TE1 = 1.34ms; TE2 = 2.57ms; FA = 9deg; Slice thickness = 1.5mm; Bandwidth1 = 820Hz/Px, Bandwidth2 = 1040 Hz/Px; reconstructed voxel= 1.3x1.3x1.5mm3; parallel imaging = CAIPIRINHA, PAT=3). The Dixon was acquired 7 minutes after injection of a quarter dose of a macrocyclic contrast agent. The left and the right kidneys were outlined semi-automatically by a nephrologist, using region-growing on masks created by K-means clustering of the fat and water images. Kidney masks were reviewed by two independent staff.

Training, Validation, Testing, and Cross-Validation: Segmentation was performed using the UNETR7 implementation in MONAI9 (hyperparameters in Table 1). For the main experiment, 59 datasets were used for training, 15 for validation, and 8 for testing. The training was performed on a workstation with an Intel Xeon 10-core 2.2GHz CPU and NVIDIA GeForce GTX 1080 Ti 11GB GPU, taking approximately 24 hours. Additionally, three-fold cross-validation (CV1, CV2, CV3) was performed with 2 sets utilised for training and the remaining set used for testing (Table 2).

Statistical Analyses: Total Kidney Volumes (TKV) were computed by adding up left and right kidney volumes. Dice Score Coefficient10 (DSC), Bland-Altman, Coefficient of Variation (CoV) for repeated measures11, and non-parametric Wilcoxon signed rank test for paired samples assessed the performance of the automated segmentation method. Statistical significance was defined at p<0.05.

RESULTS

Figure 1 shows example segmentations for two cases where the automated results are considered successful.

Table 2 summarises DSC, TKV and CoV for all experiments. DSC ranged from 0.88 to 0.94 and CoV between supervised and automated TKV ranged from 8% to 22%. The lower limit of agreement (LoA) ranged from -74% to −24% and the upper LoA from +35% to +120% (Figure 2). The mean difference was significant for CV2 (p=0.01) and CV3 (p<0.001) but not for the main experiment (p=0.38) and CV1 (p=0.09).

Closer inspection of the outliers suggests that issues with automated segmentation were largely due to one of four causes (Figure 3): (1) gross morphological differences from a typical kidney due to large cysts; (2) pre-contrast Dixon with lower kidney/fat separation contrast; (3) lean participants with minimal levels of perirenal fat; (4) poor image quality due to breathing artefacts in the breath hold sequence.

To estimate the impact of these issues, key metrics were recalculated after excluding such cases. This improved DSC slightly to 0.90-0.94, CoV improves to the range [5%, 12%], lower LoA improved to the range [–28%, -9%] and upper LoA to the range [+14%, +48%]. The difference with supervised segmentation in CV2 (p=0.03) and CV3 (p<0.001) remained significant.

DISCUSSION

In most cases, UNETR generates promising single-kidney segmentation results in an automated fashion, with results for the key biomarker TKV that are close to those derived from supervised segmentation.

Substantial segmentation errors were identified in a smaller number of cases with characteristics that were atypical of the general training population. This indicates the need for either retraining the model in a more representative training set, and/or complementing the automated segmentation with additional post-processing and automated quality control (QC) steps to identify and exclude unreliable results.

However, the results also show that even after excluding the extreme outliers, the limits of agreement remain large compared to typical volumetric changes expected over time in a disease such as DKD. Future studies will investigate whether the results can be improved further by extending the training set.

CONCLUSION

Transformer-based architectures are a promising alternative approach for segmentation of individual kidneys on high-resolution T1-weighted MRI. However, careful further analysis, finetuning and post-processing is needed to better understand and reduce the difference with supervised results.

Acknowledgements

This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 115974. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA and JDRF.

Any dissemination of results reflects only the author's view; the JU is not responsible for any use that may be made of the information it contains.

References

  1. Daniel, Alexander J., et al. "Automated renal segmentation in healthy and chronic kidney disease subjects using a convolutional neural network." Magnetic resonance in medicine 86.2 (2021): 1125-1136.
  2. Klepaczko, Artur, et al. "Whole kidney and renal cortex segmentation in contrast-enhanced MRI using a joint classification and segmentation convolutional neural network." Biocybernetics and Biomedical Engineering 42.1 (2022): 295-311.
  3. Inoue, Kaiji, et al. "The utility of automatic segmentation of kidney MRI in chronic kidney disease using a 3D convolutional neural network." Scientific Reports 13.1 (2023): 17361.
  4. Rombolotti, Maria, et al. "Automatic cyst and kidney segmentation in autosomal dominant polycystic kidney disease: Comparison of U-Net based methods." Computers in Biology and Medicine 146 (2022): 105431.
  5. Taylor, Jonathan, et al. "An Artificial Intelligence generated Automated Algorithm to measure Total Kidney Volume in ADPKD." Kidney International Reports.
  6. Zhao, Dan, et al. "Current progress in artificial intelligence-assisted medical image analysis for chronic kidney disease: A literature review." Computational and Structural Biotechnology Journal (2023).
  7. Hatamizadeh, Ali, et al. "Unetr: Transformers for 3d medical image segmentation." Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2022.
  8. Gooding, Kim M., et al. "Prognostic imaging biomarkers for diabetic kidney disease (iBEAt): study protocol." BMC nephrology 21 (2020): 1-11.
  9. Cardoso, M. Jorge, et al. "Monai: An open-source framework for deep learning in healthcare." arXiv preprint arXiv:2211.02701 (2022).
  10. Dice, L. R. Measures of the amount of ecologic association between species. Ecology 26, 297–302, doi:10.2307/1932409 (1945).
  11. Jones, R. G. & Payne, R. B. Clinical investigation and statistics in laboratory medicine (American Association for Clinical Chemistry, 1997).

Figures

Figure 1. Example results for 2 patients in the test dataset of the main experiment with successful automated segmentations (one per row). The left columns show a representative slice in the opposed-phase Dixon, the middle column shows manual ground truth labels for each kidney, and the right column shows masks for the automated segmentation produced by UNETR.

Table 1. UNETR hyper-parameters utilised for the kidney segmentation.

Table 2. Main results for test cases in the main experiment and three-fold cross validation experiments (CV1, CV2, CV3). TKV estimated using expert supervised segmentation and automated UNETR for the test sets (Mean ± SD). Dice Score Coefficient (DSC) (Mean ± SD), and the Coefficient of Variation (CoV) for repeated measures are also indicated for test cases in all the experiments. All measurements are reported before and after outlier removal.

Figure 2. Bland-Altman (%) plots comparing automated versus supervised Total Kidney Volume (TKV) measurements for the main experiment (top-left, n=8), cross-validation 1 (top-right, n=27), cross-validation 2 (bottom-left, n=28), and cross-validation 3 (bottom-right, n=27). The thick line indicates mean difference while the dotted line indicates ± 2 standard deviations (±SD).

Figure 3. Central coronal slices on the Dixon (left column) from 3 DKD patients (one per row) with segmentation error in the automated results (right column) compared to the supervised results (middle column). Cases show: The presence of a large cyst in the right kidney and next to the left kidney on a pre-contrast Dixon (top row); a lean patient with very limited perirenal fat, leading to inclusion of the left kidney mostly included in right kidney mask (middle row), and breathing artefacts (bottom row).

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)
0725
DOI: https://doi.org/10.58530/2024/0725