3560

Comparison of 3D convolutional neural networks and loss functions for ventilated lung segmentation using multi-nuclear hyperpolarized gas MRI

Joshua R Astley^1,2, Alberto M Biancardi¹, Paul J Hughes¹, Laurie J Smith¹, Helen Marshall¹, Guilhem J Collier¹, James Eaden¹, Nicholas D Weatherley¹, Jim M Wild¹, and Bilal A Tahir^1,2
¹POLARIS, University of Sheffield, Sheffield, United Kingdom, ²Oncology and Metabolism, University of Sheffield, Sheffield, United Kingdom

Synopsis

Deep learning has shown great promise for numerous medical image segmentation tasks, including delineation of ventilated lung volumes from hyperpolarized gas MRI. We previously demonstrated the utility of a VNet convolutional neural network (CNN), trained on a combination of ³He and ¹²⁹Xe scans, in producing accurate segmentations that outperform conventional methods. In this work, we compared the performance of several 3D CNNs and loss functions for segmentation of ventilated lungs on a significantly larger and more diverse multi-nuclear hyperpolarised gas MRI dataset using several training strategies. We observe that the UNet CNN provides the best performing model for our dataset.

Introduction

Hyperpolarized gas MRI using ³He and ¹²⁹Xe noble gases enables visualization of regional lung ventilation with high spatial and temporal resolution¹. Segmentation of ventilated lung regions is required for the calculation of clinical biomarkers, such as the ventilated defect percentage. Recent research in deep learning (DL) has shown that convolutional neural networks (CNNs) hold great promise for hyperpolarised gas MRI ventilation segmentation.² We previously demonstrated the utility of a VNet CNN, trained on a combination of ³He and ¹²⁹Xe scans, in producing accurate segmentations that outperform conventional methods.³Here, we compare the performance of several common 3D CNNs and loss functions on a significantly expanded dataset comprising a wider range of diseases. We further conducted additional parametrisation experiments to justify the choice of CNN hyperparameters.

Methods

Imaging data
All subjects underwent MRI on a 1.5T HDx scanner (GE Healthcare) using 3D steady-state free precession sequences.⁴Flexible quadrature radiofrequency coils (CMRS) were employed for transmission and reception of MR signals at the Larmor frequencies of ³He and ¹²⁹Xe. The imaging dataset used in this study was collected retrospectively from several clinical observational studies and patients referred for clinical scans. The dataset consisted of 743 volumetric hyperpolarized gas MRI scans (22890 slices) and corresponding expert segmentations, with either ³He (248 scans, 11370 slices) or ¹²⁹Xe (495 scans, 11520 slices), from 326 healthy subjects and patients with pulmonary pathologies.

Parameterization
Several experiments were conducted to assess the effect of varying network architecture and loss function using a subset of the data comprising 431 hyperpolarized gas MRI scans, with either ³He (n=173) or ¹²⁹Xe (n=258). 29 scans were used as a parameterization testing set. Figure 1 shows a selection of the results of these experiments. The VNet and UNet architectures significantly outperformed the other networks tested. Consequently, we tested these two networks with three loss functions. We found no significant differences between most loss functions tested across both networks.

DL segmentation
Five sets of experiments were performed to train the CNNs using the NiftyNet⁵ framework: (1,2) the model was trained on either ¹²⁹Xe or ³He images; (3,4) transfer learning was applied to the pre-trained models in (1,2) to fine-tune the network for the opposite gas images⁷; (5) the model was trained on combined ³He and ¹²⁹Xe data. These experiments were performed for both the UNet⁶ and VNet⁷architectures with a cross-entropy loss and PRELU activation function, selected based on the parameterization experiments. Each trained model was evaluated on a combined testing dataset of ³He and ¹²⁹Xe images (n=75). Whilst same-patient repeat or longitudinal data was employed during training, no such data was included in the testing phase, representing an independent validation cohort.

Data Analysis
To evaluate segmentation accuracy, Dice similarity coefficients (DSCs), Average Boundary Hausdorff distance (Avg-HD) and the 95th percentile Hausdorff distance (HD95) were computed. The XOR metric was also used due to its sensitivity to false positives.⁸ Paired t-tests were conducted between the 75 testing scans for each DL method across both networks tested. Pearson correlation and Bland-Altman analyses were conducted to compare the volumes of DL and expert segmentations.

Results

Table 2 shows the results on the testing set for the five DL training methods using both the VNet and UNet CNNs. The UNet models outperform the VNet models for all evaluation metrics. The mean DSC, XOR and Avg-HD values are most accurate for the combined ¹²⁹Xe and ³He UNet model. This UNet model exhibited statistically significant improvements over the combined VNet model using both the DSC and Avg-HD metrics (see Figure 2).

Figure 3 shows the qualitative and quantitative performance of the best performing VNet and UNet methods, namely, the combined ¹²⁹Xe and ³He training models, for three testing set cases with different diseases. For both networks, the DL-ventilated segmentations accurately follow the borders of expert segmentations with the UNet outperforming the VNet for both the DSC and Avg-HD metrics, producing DSC values above 0.98.

Correlation and Bland-Altman analyses are shown in Figure 4 for the combined ¹²⁹Xe and ³He UNet- and VNet-trained models. For both models, the DL segmentation volume is highly correlated with the expert segmentation and exhibits minimal bias. The UNet trained model marginally outperforms the VNet in both analyses.

Discussion

We comprehensively evaluated segmentation performance of the UNet and VNet CNNs using five DL training methods. We show that the combined ³He and ¹²⁹Xe UNet model significantly outperforms the corresponding VNet model. However, we did not observe statistically significant differences between the different UNet trained models. As such, we cannot conclude which UNet trained model is, statistically, the best performing.

Whilst we ensured that the same patient was not included in both the training and testing sets, due to an imbalance in repeat and longitudinal scans between ¹²⁹Xe and ³He, the testing set was imbalanced in favour of ¹²⁹Xe, potentially biasing results. Future work will expand the testing set with the aim of maintaining equal distributions.

Conclusion

We compared the performance of several 3D CNNs, loss functions and training strategies for segmentation of ventilated lungs on a large and diverse multi-nuclear hyperpolarized gas MRI dataset. The highest performing model was the UNet CNN trained on both ³He and ¹²⁹Xe data.

Acknowledgements

This work was supported by Yorkshire Cancer Research, Weston Park Cancer Charity, National Institute of Health Research and the Medical Research Council.

References

1. Fain S, Korosec F, Holmes J, et al. Functional lung imaging using hyperpolarized gas MRI. J. Magn. Reson. Imaging, 2007;25:910-923.

2. Tustison, N, Avants B, Lin Z, et al. Convolutional Neural Networks with Template-Based Data Augmentation for Functional Lung Image Quantification. Academic Radiology, 2019;26(3):412–423.

3. Astley, J., Biancardi, A., Hughes, P. et al. 3D deep convolutional neural network-based ventilated lung segmentation using multi-nuclear hyperpolarized gas MRI. Thoracic Image Analysis. TIA 2020, 08 Oct 2020, Lima, Peru. Lecture Notes in Computer Science, 12502. Springer Verlag, pp. 24-35.

4. Horn FC, Tahir BA, Stewart NJ, et al. Lung ventilation volumetry with same-breath acquisition of hyperpolarized gas and proton MRI. NMR Biomed. 2014 Dec;27(12):1461-7.

5. Gibson E, Li W, Sudre C, et al. NiftyNet: a deep-learning platform for medical imaging. Comput Methods Programs Biomed. 2018 May;158:113-122.

6. Çiçek Ö, Abdulkadir A, Lienkamp S.S, et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. MICCAI 2016. Lecture Notes in Computer Science, 2016. vol 9901. Springer, Cham.

7. Milletari F, Navab N & Ahmadi S. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016. 565-571. 10.1109/3DV.2016.79.

8. Biancardi A, Wild J.M. New Disagreement Metrics Incorporating Spatial Detail – Applications to Lung Imaging. MIUA 2017. Communications in Computer and Information Science, vol 723. 2017. Springer, Cham.

Figures

Figure 1. Results from parameterisation experiments, comparing a) four convolutional neural network architectures and b) three loss functions across the two best preforming CNN architectures. Note that the VNet dice loss failed after 15,000 iterations.

Table 1. Comparison of segmentation performance of the VNet and UNet architectures across five DL training methods for all scans in the testing set. Means are given; the best result for each metric is in bold.

Figure 2. Comparison of DL performance for each of the two networks and five DL training strategies using DSC (top) and average boundary Hausdorff distance (bottom) metrics. Significances of differences between the best performing VNet and UNet architectures, respectively, were assessed using paired t tests.

Figure 3. Example coronal slices of the UNet and VNet combined ³He and ¹²⁹Xe trained segmentations for three cases with different diseases compared to the expert segmentations. DSC and Avg HD values are given for each case.

Figure 4. Correlation and agreement analysis of lung volumes for 75 testing set cases compared to expert segmentations for combined ³He and ¹²⁹Xe a) VNet and b) UNet models.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

3560