2284

Automatic Segmentation of Hyperpolarized Gas MRI via Deep Learning

Joshua R Astley^1,2, Alberto M Biancardi¹, Paul JC Hughes¹, Laurie J Smith¹, Helen Marshall¹, Grace T Mussell¹, James Eaden¹, Nicholas D Weatherley¹, Guilhem J Collier¹, Jim M Wild¹, and Bilal A Tahir^1,2
¹POLARIS, Department of Infection, Immunity & Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom, ²Department of Oncology and Metabolism, University of Sheffield, Sheffield, United Kingdom

Synopsis

Deep learning (DL)-based segmentation was conducted on a total of 431 ³He and ¹²⁹Xe 3D ventilation images using several training paradigms. Combined ³He and ¹²⁹Xe training showed a significant improvement over all other DL methods. In the majority of DL models, no significant difference was observed between ³He and ¹²⁹Xe testing data. Results suggest that ³He and ¹²⁹Xe images share important features that allow combined ³He and ¹²⁹Xe DL models to provide superior segmentations to singular gas models. In addition, it was shown that DL generates faster segmentations without the requirement of proton MRI compared to state-of-the-art model-based solutions.

Introduction

Hyperpolarized gas MRI enables visualization of regional lung ventilation with high spatial and temporal resolution¹. Quantitative biomarkers derived from this modality, including the ventilated defect percentage, provide further insights into pulmonary pathologies currently not possible with alternative techniques². To facilitate the computation of such biomarkers, segmentation of ventilated lung is required. Current approaches such as multichannel spatial fuzzy c-means (SFCM) thresholding³ are semi-automatic, require a corresponding proton image aligned with the ventilation image and require significant time to manually edit segmentations. Recent research in deep learning (DL) has shown promising results for numerous image segmentation problems⁴. Here, we evaluate several DL methods for the automatic segmentation of hyperpolarized gas MRI. We also investigate the effect of the noble gas, ³He and ¹²⁹Xe, on DL performance.

Methods

Imaging data:
All subjects underwent MRI at 1.5T. Flexible quadrature radiofrequency coils were employed for transmission and reception of MR signals at the Larmor frequencies of ³He and ¹²⁹Xe. Data composed of 431 3D hyperpolarized gas images, with either ³He (n=173) or ¹²⁹Xe (n=258), from healthy subjects and patients with pulmonary pathologies (see Figure 1 for details).

DL segmentation:
DL-based segmentation was performed using NiftyNet⁵. A VNet architecture was used with a PreLU activation function^4,6. Three sets of experiments were performed to train a convolutional neural network: (1) the model was trained on either ¹²⁹Xe or ³He images; (2) transfer learning was applied to the pre-trained models in (1) to fine-tune the network for the opposite gas images⁷; (3) the model was trained on the combined ³He and ¹²⁹Xe data. 10% of the training data was used for internal validation. Each trained model was evaluated on a combined testing dataset of ³He and ¹²⁹Xe images (n=33). Whilst same-patient longitudinal ventilation image data was employed during training, no such patient data was included in the testing phase, representing an independent validation cohort. The experiments are shown in Figure 1.

Data analysis:
To evaluate segmentation accuracy, Dice Similarity Coefficients (DSCs) were computed between the DL-based ventilation masks and those generated by expert observers. For a random subset of 13 of the testing images, DSC values were compared with multichannel³. Paired t-tests were employed to assess differences between methods. To investigate the effect of noble gas on DL segmentation, the testing set was further split into ³He and ¹²⁹Xe and analysed by Mann-Whitney tests.

Results

Figure 2 shows example segmentations from all DL methods for a range of diseases and healthy participants. Transfer learning exhibited improved DSCs only when the pre-trained ³He model was fine-tuned with ¹²⁹Xe data, compared to training on ¹²⁹Xe only (p<0.0001). Combined training on ¹²⁹Xe and ³He yielded statistically significant improvements over all other DL methods (p<0.05). A full breakdown of results is shown in Figure 3.

A further comparison was conducted between multichannel SFCM³ and the combined training on ¹²⁹Xe and ³He DL model on a subset of 13 testing images. No significant difference was observed between methods (p=0.842) (see Figure 4).

Figure 5 exhibits the differences between ³He and ¹²⁹Xe testing images for each DL method. The majority of DL methods demonstrated no significant differences between ³He and ¹²⁹Xe; a significant difference in testing performance in two of the methods was observed (training on ¹²⁹Xe, training on ³He and transfer learning on ¹²⁹Xe).

Discussion

The highest performing DL method evaluated incorporated both ³He and ¹²⁹Xe training data; the significant increase, and variability, in training data reduces overfitting and hence increases the generalisability of the model. Looking at the effect of the gas, we found significant differences in DSCs between ³He and ¹²⁹Xe testing images for two models, indicating that whilst both gases provide clinically comparable ventilation distributions⁸, DL segmentation requires both ³He and ¹²⁹Xe images during training to generate a robust, generalizable model that is agnostic to gas. One limitation is that datasets for ³He and ¹²⁹Xe were not identical in both number of scans and which patients were scanned, perhaps inducing differences in segmentation performance.

Analysis of multichannel SFCM and the highest performing DL method demonstrated no significant differences between the methods. Multichannel SFCM³ requires a corresponding, aligned proton image to generate a ventilation mask; this is not the case in the DL model as only the ventilation image is required. The DL method has a significantly shorter run time (approximately 7 seconds per 3D image on a GPU) compared to 5 minutes for multichannel SFCM³. By visual inspection (see figure 4), less of the trachea and bronchi were erroneously segmented, reducing the time taken for manual editing.

Conclusion

In this work, DL segmentation methods were capable of segmenting hyperpolarized gas MRI from both ³He and ¹²⁹Xe to a statistically identical level as current model-based segmentation methods. DL methods do not require a registered proton image and are expected to dramatically reduce the time taken to generate segmentations and manually edit ventilated masks. It was shown that combined learning on ³He and ¹²⁹Xe yields significant improvements in DSC over all methods investigated.

Acknowledgements

This work was supported by Yorkshire Cancer Research, Weston Park Cancer Charity, National Institute of Health Research, the Medical Research Council and GlaxoSmithKline (PJCH:BIDS3000032592).

References

1. Fain S, Korosec F, Holmes J, et al. Functional lung imaging using hyperpolarized gas MRI. J. Magn. Reson. Imaging, 2007;25:910-923.

2. Woodhouse N, Wild J, Paley M, et al. Combined helium‐3/proton magnetic resonance imaging measurement of ventilated lung volumes in smokers compared to never‐smokers. J. Magn. Reson. Imaging, 2005;21:365-369.

3. Biancardi AM, Acunzo L, Marshall H, et al. A paired approach to the segmentation of proton and hyperpolarized gas MR images of the lungs. ISMRM 2018.

4. Bakator M, Radosav D. Deep Learning and Medical Diagnosis: A Review of Literature. Multimodal Technologies Interact. 2018;2:47.

5. Gibson E, Li W, Sudre C, et al. NiftyNet: a deep-learning platform for medical imaging, Computer Methods and Programs in Biomedicine, 2018;158:113-122.

6. Tustison, N, Avants B, Lin Z, et al. Convolutional Neural Networks with Template-Based Data Augmentation for Functional Lung Image Quantification. Academic Radiology, 2019;26(3):412–423.

7. Zha W, Fain S, Schiebler M, et al. Deep convolutional neural networks with multiplane consensus labeling for lung function quantification using UTE proton MRI. J Magn Reson Imaging, 2019:50:1169-1181.

8. Stewart N, Chan H, Hughes P, et al. Comparison of 3He and 129Xe MRI for evaluation of lung microstructure and ventilation at 1.5T. J. Magn. Reson. Imaging, 2018;48:632-642.

Figures

Figure 1. Top: Summary of patient imaging data, showing total number of images (n=431), number of images acquired using ¹²⁹Xe (n=258) or ³He (n=173) in addition to the range of pulmonary pathologies of the subjects. Bottom: Deep learning segmentation experiments conducted in this study. The experiments show the relationship between training and testing data for each training paradigm.

Figure 2. Examples of segmentations from expert observers and the different DL methods for patients with a range of diseases and a healthy subject. Mean±SD DSC values are calculated across 33 testing images.

Figure 3. Comparison of different DL methods. P values were calculated using paired t-tests. All datapoints are shown. Means are indicated by a coloured line. Where P values are given above multiple black lines, the P values relate to all comparisons.

Figure 4. Comparison of multichannel SFCM³ to combined ¹²⁹Xe and ³He DL with example segmentation demonstrating erroneous mask pixels.

Figure 5. Graphs indicating the performance of DL methods on the testing data split by noble gas (¹²⁹Xe and ³He). All datapoints are shown. Means are indicated by a horizontal line. P values were calculated using Mann-Whitney tests, where quoted P values indicate a significant difference in performance between ¹²⁹Xe and ³He data.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

2284