2284

Automatic Segmentation of Hyperpolarized Gas MRI via Deep Learning
Joshua R Astley1,2, Alberto M Biancardi1, Paul JC Hughes1, Laurie J Smith1, Helen Marshall1, Grace T Mussell1, James Eaden1, Nicholas D Weatherley1, Guilhem J Collier1, Jim M Wild1, and Bilal A Tahir1,2
1POLARIS, Department of Infection, Immunity & Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom, 2Department of Oncology and Metabolism, University of Sheffield, Sheffield, United Kingdom

Synopsis

Deep learning (DL)-based segmentation was conducted on a total of 431 3He and 129Xe 3D ventilation images using several training paradigms. Combined 3He and 129Xe training showed a significant improvement over all other DL methods. In the majority of DL models, no significant difference was observed between 3He and 129Xe testing data. Results suggest that 3He and 129Xe images share important features that allow combined 3He and 129Xe DL models to provide superior segmentations to singular gas models. In addition, it was shown that DL generates faster segmentations without the requirement of proton MRI compared to state-of-the-art model-based solutions.

Introduction

Hyperpolarized gas MRI enables visualization of regional lung ventilation with high spatial and temporal resolution1. Quantitative biomarkers derived from this modality, including the ventilated defect percentage, provide further insights into pulmonary pathologies currently not possible with alternative techniques2. To facilitate the computation of such biomarkers, segmentation of ventilated lung is required. Current approaches such as multichannel spatial fuzzy c-means (SFCM) thresholding3 are semi-automatic, require a corresponding proton image aligned with the ventilation image and require significant time to manually edit segmentations. Recent research in deep learning (DL) has shown promising results for numerous image segmentation problems4. Here, we evaluate several DL methods for the automatic segmentation of hyperpolarized gas MRI. We also investigate the effect of the noble gas, 3He and 129Xe, on DL performance.

Methods

Imaging data:
All subjects underwent MRI at 1.5T. Flexible quadrature radiofrequency coils were employed for transmission and reception of MR signals at the Larmor frequencies of 3He and 129Xe. Data composed of 431 3D hyperpolarized gas images, with either 3He (n=173) or 129Xe (n=258), from healthy subjects and patients with pulmonary pathologies (see Figure 1 for details).

DL segmentation:
DL-based segmentation was performed using NiftyNet5. A VNet architecture was used with a PreLU activation function4,6. Three sets of experiments were performed to train a convolutional neural network: (1) the model was trained on either 129Xe or 3He images; (2) transfer learning was applied to the pre-trained models in (1) to fine-tune the network for the opposite gas images7; (3) the model was trained on the combined 3He and 129Xe data. 10% of the training data was used for internal validation. Each trained model was evaluated on a combined testing dataset of 3He and 129Xe images (n=33). Whilst same-patient longitudinal ventilation image data was employed during training, no such patient data was included in the testing phase, representing an independent validation cohort. The experiments are shown in Figure 1.

Data analysis:
To evaluate segmentation accuracy, Dice Similarity Coefficients (DSCs) were computed between the DL-based ventilation masks and those generated by expert observers. For a random subset of 13 of the testing images, DSC values were compared with multichannel3. Paired t-tests were employed to assess differences between methods. To investigate the effect of noble gas on DL segmentation, the testing set was further split into 3He and 129Xe and analysed by Mann-Whitney tests.

Results

Figure 2 shows example segmentations from all DL methods for a range of diseases and healthy participants. Transfer learning exhibited improved DSCs only when the pre-trained 3He model was fine-tuned with 129Xe data, compared to training on 129Xe only (p<0.0001). Combined training on 129Xe and 3He yielded statistically significant improvements over all other DL methods (p<0.05). A full breakdown of results is shown in Figure 3.

A further comparison was conducted between multichannel SFCM3 and the combined training on 129Xe and 3He DL model on a subset of 13 testing images. No significant difference was observed between methods (p=0.842) (see Figure 4).

Figure 5 exhibits the differences between 3He and 129Xe testing images for each DL method. The majority of DL methods demonstrated no significant differences between 3He and 129Xe; a significant difference in testing performance in two of the methods was observed (training on 129Xe, training on 3He and transfer learning on 129Xe).

Discussion

The highest performing DL method evaluated incorporated both 3He and 129Xe training data; the significant increase, and variability, in training data reduces overfitting and hence increases the generalisability of the model. Looking at the effect of the gas, we found significant differences in DSCs between 3He and 129Xe testing images for two models, indicating that whilst both gases provide clinically comparable ventilation distributions8, DL segmentation requires both 3He and 129Xe images during training to generate a robust, generalizable model that is agnostic to gas. One limitation is that datasets for 3He and 129Xe were not identical in both number of scans and which patients were scanned, perhaps inducing differences in segmentation performance.

Analysis of multichannel SFCM and the highest performing DL method demonstrated no significant differences between the methods. Multichannel SFCM3 requires a corresponding, aligned proton image to generate a ventilation mask; this is not the case in the DL model as only the ventilation image is required. The DL method has a significantly shorter run time (approximately 7 seconds per 3D image on a GPU) compared to 5 minutes for multichannel SFCM3. By visual inspection (see figure 4), less of the trachea and bronchi were erroneously segmented, reducing the time taken for manual editing.

Conclusion

In this work, DL segmentation methods were capable of segmenting hyperpolarized gas MRI from both 3He and 129Xe to a statistically identical level as current model-based segmentation methods. DL methods do not require a registered proton image and are expected to dramatically reduce the time taken to generate segmentations and manually edit ventilated masks. It was shown that combined learning on 3He and 129Xe yields significant improvements in DSC over all methods investigated.

Acknowledgements

This work was supported by Yorkshire Cancer Research, Weston Park Cancer Charity, National Institute of Health Research, the Medical Research Council and GlaxoSmithKline (PJCH:BIDS3000032592).

References

1. Fain S, Korosec F, Holmes J, et al. Functional lung imaging using hyperpolarized gas MRI. J. Magn. Reson. Imaging, 2007;25:910-923.

2. Woodhouse N, Wild J, Paley M, et al. Combined helium‐3/proton magnetic resonance imaging measurement of ventilated lung volumes in smokers compared to never‐smokers. J. Magn. Reson. Imaging, 2005;21:365-369.

3. Biancardi AM, Acunzo L, Marshall H, et al. A paired approach to the segmentation of proton and hyperpolarized gas MR images of the lungs. ISMRM 2018.

4. Bakator M, Radosav D. Deep Learning and Medical Diagnosis: A Review of Literature. Multimodal Technologies Interact. 2018;2:47.

5. Gibson E, Li W, Sudre C, et al. NiftyNet: a deep-learning platform for medical imaging, Computer Methods and Programs in Biomedicine, 2018;158:113-122.

6. Tustison, N, Avants B, Lin Z, et al. Convolutional Neural Networks with Template-Based Data Augmentation for Functional Lung Image Quantification. Academic Radiology, 2019;26(3):412–423.

7. Zha W, Fain S, Schiebler M, et al. Deep convolutional neural networks with multiplane consensus labeling for lung function quantification using UTE proton MRI. J Magn Reson Imaging, 2019:50:1169-1181.

8. Stewart N, Chan H, Hughes P, et al. Comparison of 3He and 129Xe MRI for evaluation of lung microstructure and ventilation at 1.5T. J. Magn. Reson. Imaging, 2018;48:632-642.

Figures

Figure 1. Top: Summary of patient imaging data, showing total number of images (n=431), number of images acquired using 129Xe (n=258) or 3He (n=173) in addition to the range of pulmonary pathologies of the subjects. Bottom: Deep learning segmentation experiments conducted in this study. The experiments show the relationship between training and testing data for each training paradigm.

Figure 2. Examples of segmentations from expert observers and the different DL methods for patients with a range of diseases and a healthy subject. Mean±SD DSC values are calculated across 33 testing images.

Figure 3. Comparison of different DL methods. P values were calculated using paired t-tests. All datapoints are shown. Means are indicated by a coloured line. Where P values are given above multiple black lines, the P values relate to all comparisons.

Figure 4. Comparison of multichannel SFCM3 to combined 129Xe and 3He DL with example segmentation demonstrating erroneous mask pixels.

Figure 5. Graphs indicating the performance of DL methods on the testing data split by noble gas (129Xe and 3He). All datapoints are shown. Means are indicated by a horizontal line. P values were calculated using Mann-Whitney tests, where quoted P values indicate a significant difference in performance between 129Xe and 3He data.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)
2284