0391

Deep learning-based thoracic cavity segmentation for hyperpolarized ¹²⁹Xe MRI

Suphachart Leewiwatwong¹, Junlan Lu², David Mummy³, Isabelle Dummer^3,4, Kevin Yarnall⁵, Ziyi Wang¹, and Bastiaan Driehuys^1,2,3
¹Biomedical Engineering, Duke University, Durham, NC, United States, ²Medical Physics, Duke University, Durham, NC, United States, ³Radiology, Duke University, Durham, NC, United States, ⁴Bioengineering, McGill University, Montréal, QC, Canada, ⁵Mechanical Engineering and Materials Science, Duke University, Durham, NC, United States

Synopsis

Quantifying hyperpolarized ¹²⁹Xe MRI of pulmonary ventilation and gas exchange requires accurate segmentation of the thoracic cavity. This is typically done either manually or semi-automatically using an additional proton scan volume-matched to the gas image. These methods are prone to operator subjectivity, image artifacts, alignment/registration issues, and SNR. Here we demonstrate using a 3D convolutional neural network (CNN) to automatically and directly delineate the thoracic cavity from ¹²⁹Xe MRI alone. This 3D-CNN uses a combination of Dice-Focal, perceptual loss, and training with template-based data augmentation to demonstrate thoracic cavity segmentation with a Dice score of 0.955 vs. expert readers.

Introduction

Hyperpolarized ¹²⁹Xe MRI is increasingly used in 3D imaging of ventilation and gas exchange¹. However, quantification of these images relies on accurate delineation of the subject’s thoracic cavity. This typically uses a separate breath-hold ¹H image that is segmented and registered to the functional scan². Nevertheless, these additional images take additional time to acquire, may not be well-matched to the lung inflation volume of the ¹²⁹Xe scan, or may simply be unavailable. Moreover, performing segmentations takes time and training, which is a primary cause of inter-analyst variability. Therefore, it is highly desirable to derive a high-quality thoracic cavity mask from the ¹²⁹Xe MRI scan alone. This task is particularly challenging because the lung boundary we are seeking to delineate from ¹²⁹Xe MRI is often adjacent to the most prominent ventilation defects. Here, we demonstrate a 3D convolutional neural network (CNN), trained on expert segmentations of both ¹H and ¹²⁹Xe images, that reliably delineates the thoracic cavity from only the ¹²⁹Xe MRI scan.

Methods

Dataset generation:
Our study used 232 image datasets that included 3D radial acquisitions of both ¹²⁹Xe ventilation and ¹H thoracic cavity anatomy. The ¹H images were segmented by expert readers to generate thoracic cavity masks. These data were divided into 185 sets for training and 47 for validation. Final testing of the 3D-CNN was done using an additional 33 “pristine” datasets where masks were generated by expert readers who segmented the ventilation images while using the ¹H images as a reference. Because the training data predominantly contained subjects with minor or no defects, and because severe defects provide the largest segmentation challenge, this pool was increased by template augmentation³ to create a more balanced dataset. Representative expert reader segmentations for the training data are shown in Fig 1, which provides both the ¹²⁹Xe ventilation and registered ¹H thoracic cavity images as well as the superposed segmentation outline used to visually confirm accuracy.

Model architecture:
Model training consisted of two key modules. The first was a basic segmentation model (SM) underlying all models described here and adapted from the V-Net for medical image segmentation⁴; however, each model differed in the specifics of their training. The second was a perceptual loss model which consists of an encoder-decoder model (EDM) and a predictor model (PM) similar to Anatomically Constrained Neural Networks (ACNNs)⁵. The PM structure was similar to that of the EDM encoder. Each model was trained separately, and upon the convergence of loss, were combined and trained simultaneously as shown in Fig 2.

Loss:
The segmentation model was first trained using Dice-Focal loss ( L_Dice & L_focal ), followed by a perceptual loss function L_perc. The total loss function is given by:

$L_{tot}=L_{Dice}+0.1*L_{focal}+0.001*L_{perc}$

The focal loss function⁶ encourages the model to include ventilation defect areas in the segmentation by lowering the relative penalty of false positives to false negative and makes the model focus on poorly ventilated areas by increasing the overall penalty in that area. This is illustrated in Fig 3 showing that binary cross entropy causes defects to shift the lung boundary, but that this is overcome by the addition of focal loss. The perceptual loss function is added to encourage the segmentation to following the lung shape and is given by the Euclidean distance (ED) between the output from the encoder to the ground-truth mask and the SM prediction. For training the perceptual loss model, loss is given by the binary cross entropy for EDM and ED for PM.

Training:
The SM, EDM, and PM were trained individually over 126,000 steps with batch size = 1. Subsequently, the frozen-weight encoder in the EDM was used to train the SM for an additional 126,000 steps. The SM was trained using ventilation images as inputs and the ground-truth segmentation as the outputs. EDM was trained using the ground-truth segmentation for both inputs and outputs. The PM used either the ¹H structural or ¹²⁹Xe ventilation images as the inputs; the outputs were from the encoder in the EDM using the associated segmentation.

Results

We found that the model performance improves significantly when training data undergoes an additional registration step. The registered dataset is created by registering the ventilation images to the ground-truth segmentations to improve alignment and SNR. However, while this facilitates training of the network, it causes it to struggle with segmenting low-SNR images. Therefore, the perceptual loss component is used to constrain the network to predict segmentation that follows the known anatomical structure. This is illustrated in Fig 4, showing incorrect segmentation of low-SNR images when using only the traditional 3D-CNN with binary cross entropy. Fig 5 shows quantitative comparison between models, further demonstrating that the proposed model performs better than the traditional model and comparable to expert readers (Dice score: 0.955). We did not observe significant differences between training PM with ventilation or ¹H images.

Discussion

We have demonstrated that a 3D-CNN using only ¹²⁹Xe MRI inputs generates thoracic cavity segmentations that are similar to those of expert readers using both ¹²⁹Xe MRI and volume-matched ¹H scans. These methods may help reduce inter-reader subjectivity and allow quantification of pulmonary ¹²⁹Xe MRI when corresponding proton images are not volume-matched, poorly registered, or unavailable.

Acknowledgements

R01HL105643, R01HL12677, NSF GRFP DGE-1644868

References

1. Wang, Z., Robertson, S. H., Wang, J., He, M., Virgincar, R. S., Schrank, G. M., Bier, E. A., Rajagopal, S., Huang, Y. C., O’Riordan, T. G., Rackley, C. R., McAdams, H. P., & Driehuys, B. (2017). Quantitative analysis of hyperpolarized ¹²⁹Xe gas transfer MRI. Medical Physics, 44(6). https://doi.org/10.1002/mp.12264

2. He, M., Driehuys, B., Que, L. G., & Huang, Y. C. T. (2016). Using Hyperpolarized ¹²⁹Xe MRI to Quantify the Pulmonary Ventilation Distribution. Academic Radiology. https://doi.org/10.1016/j.acra.2016.07.014

3. Tustison, N. J., Avants, B. B., Lin, Z., Feng, X., Cullen, N., Mata, J. F., Flors, L., Gee, J. C., Altes, T. A., Mugler, J. P., & Qing, K. (2019). Convolutional Neural Networks with Template-Based Data Augmentation for Functional Lung Image Quantification. Academic Radiology. https://doi.org/10.1016/j.acra.2018.08.003

4. Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016. https://doi.org/10.1109/3DV.2016.79

5. Oktay, O., Ferrante, E., Kamnitsas, K., Heinrich, M., Bai, W., Caballero, J., Cook, S. A., De Marvao, A., Dawes, T., O’Regan, D. P., Kainz, B., Glocker, B., & Rueckert, D. (2018). Anatomically Constrained Neural Networks (ACNNs): Application to Cardiac Image Enhancement and Segmentation. IEEE Transactions on Medical Imaging. https://doi.org/10.1109/TMI.2017.2743464

6. Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2020). Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2018.2858826

Figures

Figure 1. Representative expert reader segmentations used for network training, presented in a format used to inspect their quality. The combination of ¹H and ¹²⁹Xe images as well as mask outlines makes it easier for the reader to grasp how well the segmentation aligns with both the registered ¹H image and ¹²⁹Xe functional image.

Figure 2. Training scheme: In Stage 1, our segmentation model (SM) and the models for perceptual loss (encoder-decoder model (EDM) and predictor model (PD)) are trained separately. EDM and PD are trained with a 3-step procedure as follows: (1) EDM is trained alone; (2) PD is trained with the weights in EDM frozen; (3) both PD and EDM are jointly trained simultaneously. In Stage 2, we use the encoder part from EDM to train SM. The encoder is not trained along with SM and is only used for the perceptual loss function. For the evaluation stage, only SM is used.

Figure 3. The comparison between segmentations from the SM trained with binary cross entropy and with only Dice-Focal loss. The first row shows the input ventilation images. Rows 2 and 3 show segmentation of cases with prominent ventilation defects segmented using binary cross entropy alone. This badly missed the true mask. The problem is well addressed by adding the focal loss with optimized parameters (Rows 3-4), showing that these ventilation defects are now correctly captured within the segmentation.

Figure 4. Segmentations from different models. For each model, the 4 best (left) and worst (right) segmentations are shown with Dice score/SNR (these were calculated from the whole 3D volume). The models’ segmentations are shown in red, those from the expert readers in blue, and their overlaps in purple. White arrows indicate areas where models deviated from ground truth. From this comparison, the model trained with registered data performs better than unprocessed data but struggles with low-SNR and poor ventilation images. Our proposed method (model 3 & 4) helps solve this problem.

Figure 5. Box plot and table of Dice score for each model tested with the “pristine” test dataset. The orange lines indicate the median of the score while the dash green lines indicate the mean. By comparing model 1 and 2, the plot shows the improvement in using the registered dataset for training 3D-CNN but with a drawback shown by low-score outliers due to poor ventilation and low-SNR images. Our proposed method (model 3 & 4) solves the problem and eliminates these outliers.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

0391