4078

Automatic lung segmentation for hyperpolarized gas MRI using transferred generative adversarial network and three-view aggregation

Shih-Kang Chao¹, Ummul Afia Shammi², Lucia Flors-Blasco³, Talissa Altes⁴, John Mugler^5,6, Craig Meyer^5,6, Jaime Mata⁶, Wilson Miller⁶, and Robert Thomen^2,4
¹Department of Statistics, University of Missouri, Columbia, MO, United States, ²Department of Biomedical, Biological & Chemical Engineering, University of Missouri, Columbia, MO, United States, ³Keck School of Medicine, University of Southern California, Los Angeles, CA, United States, ⁴Department of Radiology, University of Missouri, Columbia, MO, United States, ⁵Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, United States, ⁶Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, VA, United States

Synopsis

We evaluate an automatic lung segmentation approach that aggregates the predicted mask of coronal, axial, and sagittal views generated by a deep conditional generative adversarial network (GAN) whose only input is the hyperpolarized gas (HPG) MRI. On five test subjects with ventilation defect percentages [VDP] of 25-38%, our method achieved an average Dice score of 87.72, and above 90 on a healthy control subject. The slice-wise Dice score had an average correlation of 0.72 with the human expert and a median correlation of -0.79 with VDP, and both are significant for 4 out of 5 test patients at level 1%.

Introduction

Hyperpolarized gas (HPG) MRI is a technique in which high resolution images of lung function are obtained in a single breath-hold. Quantitative analysis of HPG images requires accurate segmentation of the signal within the lung boundaries. However, because HPG images are background-free, the lung boundary is not visible, and segmentation is often performed manually which is time-consuming and prone to error, especially in cases of many large ventilation defects. Here, we model the variation of lung segmentation by a probability distribution parametrized with a transferred conditional generative adversarial neural network¹. We generate confidence masks from this distribution on a slice-by-slice basis and then stack and aggregate them to obtain a 3D mask. This procedure is inspired by the behavior of a human expert. The performance is assessed on test subjects with moderate to high ventilation defect percentage (VDP 25-38%), which is the percentage of voxels with <60% of the whole-lung HP gas signal mean². We measured the performance by the Dice score between the predicted mask and the ground truth generated by the simultaneously-obtained proton images and compute the correlation of Dice score between our method and a human expert.

Methods

HPG ³He MR images were acquired in 34 asthma patients using 3D-TrueFISP (TR/TE=1.9/0.8, matrix=80x128, FA=9°, isotropic voxel dimension: 3.9mm, acquisition time=~10 s). Slice-by-slice segmentation was performed on 2D ventilation slices

$A_i^v$ which outputs 2D segmentation masks, where

$i=1,\cdots,n_v$ and

$n_v$ is the number of slices of a given view

$v\in$ {coronal, axial, sagittal}. We then stacked them into a 3D volume and aggregated the results of the three views.

Probabilistic segmentation and conditional GAN

We model the segmentation uncertainty on a given ventilation slice

$A$ of large defect by a conditional distribution

$P_v(B|A)$ , where

$B$ is a candidate segmentation mask on

$A$ . We parameterize

$P_v(B|A)$ by a U-net generator (Figure 1), which is trained by a modified conditional generative adversarial network (GAN) called BicycleGAN¹ with transfer learning. Specifically, the loss function is an equal combination between Dice and binary cross entropy loss, and the backbone of the generator is the residual network 18³ (ResNet 18) pretrained on the ImageNet. It is well known that generators trained with BicycleGAN can perform multi-modal sampling, which is crucial given the irregular shape of lungs; the transfer learning is a remedy of a small training set. For training, 2D multi-slice hyperpolarized MR images and proton-generated masks

$(A_i^v,B_i^v)$ from 34 asthma patients were used (VDP=2%-40%).

$\widehat P_v(B|A)$ is denoted as the learned

$P_v(B|A)$ . The

$\alpha$ -confidence mask

$B_{v,\alpha}(A)$ is the mask that covers the samples generated by

$\widehat P_v(B|A)$ with probability

$\alpha$ , i.e.

$B_{v,\alpha}(A)$ satisfies

$P\left(B\in B_{v,\alpha}(A)\middle| B\sim{\widehat P}_v\left(\cdot\middle| A\right)\right)=\alpha,\ v\in\{coronal, axial, sagittal\}.\quad\quad\quad\mbox{(1)}$ Obviously

$B_{v,\alpha_1}(A)\subset B_{v,\alpha_2}(A)$ when

$\alpha_1\leq\alpha_2$ . As shown in Figure 2, the

$\alpha$ -confidence masks

$B_{coronal,\alpha}(A)$ for

$\alpha$ = 0.05, 0.5 and 0.95 greatly overlap with each other when

$A$ has a low defect percentage ((b) of Figure 2), while differing when

$A$ has a high defect percentage ((a) of Figure 2). This shows that our model captures the segmentation uncertainty caused by ventilation defects.

Three-view aggregation

We separately obtain

$B_{coronal,0.95}(A_{coronal}), B_{axial,0.95}(A_{axial})$ and

$B_{sagittal,0.95}(A_{sagittal})$ by Eq. (1), where

$A_{coronal}$ ,

$A_{axial}$ and

$A_{sagittal}$ are 2D image slices obtained from the 3D image array. Only slices with signal intensities greater than four times the mean of the HPG image set are considered. We stack the 2D segmentations, as shown in Figure 3, and aggregate the 3D segmentation of each view using a median filter (3×3 kernel to each slice). We denoted the aggregated mask as “Mask-agg”, and the mask formed by

$B_{coronal,0.95}(A_{coronal})$ as “Mask-cor”.

Results

Performance is measured by

$\mbox{Dice Score (DS)} = \frac{2\ast\left|\mathrm{predicted\ mask}\ast\mathrm{ground\ truth\ mask}\right|}{\left|\mathrm{predicted\ mask}\right|+\left|\mathrm{ground\ truth\ mask}\right|}.$ In three out of five test subjects, the Mask-agg had a higher overall DS than Mask-cor. The median correlation between the slice-wise DS and the slice-wise VDP of the five test subjects was -0.7863 (P value < 0.01 for four out of five subjects). The average correlation in the slice-wise DS between human expert and Mask-agg was 0.7165. The DS of Mask-agg was lower (P value < 0.01) than that of the human expert on four of the five test subjects. For the healthy subject (VDP=2.43%), Mask-cor achieved a DS of 92.07%. Figure 4 and Figure 5 show the slice-wise DS and mask for three selected subjects and slices.

Discussion

The segmentation Mask-agg is highly correlated with but generally underperforms the human expert with exceptions of Slices (a), (d) and (i) in Figure 5, which are slices close to the boundary. Due to inaccurate predicted axial view mask, the Mask-agg is too large in Figure 5f, but it can be improved with a more sophisticated filter than median filter. The inaccuracy of Mask-agg can result from the different resolution in images of test Subject 1 and 2 than the training set, and the small backbone architecture in GAN. We will improve both aspects in future work.

Conclusion

We showcase the segmentation performance by using the

$\alpha$ -confidence mask obtained by a conditional GAN and three-view aggregation. The performance is positively correlated with the human expert and negatively correlated with the VDP.

Acknowledgements

Funded by Novartis International AG.

References

1. Zhu J-Y, Zhang R, Pathak D, et al. Toward Multimodal Image-to-Image Translation. Advances in Neural Information Processing Systems. 2017.

2. Thomen RP, Sheshadri A, Quirk JD, et al. Regional ventilation changes in severe asthma after bronchial thermoplasty with (3)He MR imaging and CT. Radiology. Jan 2015;274(1):250-259.

3. He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015: 770-778.

Figures

Figure 1. A simplified illustration of our transferred U-net conditional generator. The slanted red layers are trained with the BicycleGAN¹to get

${\widehat P}_v(B|A)$ , while the backbone is not altered during training. On the right, the green is a candidate lung segmentation sampled from

${\widehat P}_v(B|A)$ and the red is the ground truth obtained by a proton image. The randomness resulting from an isotropic d-dimensional Gaussian vector was concatenated with the first layer of the right arm, where d=10 for coronal view and d=40 for axial and sagittal view.

Figure 2.

$\alpha$ -confidence masks

$B_{coronal,\alpha}(A)$ (defined in Eqn. (1) ) for two representative slices

$A$ of high defect percentage (a) and low defect percentage (b) in coronal view for

$\alpha=$ 0.05 (yellow), 0.5 (green) and 0.95 (blue). The distribution

${\widehat P}_v(B|A)$ shows a higher variation on slice with a high defect percentage (a), leading to a higher variation in

$B_{coronal,\alpha}(A)$ for different

$\alpha$ , whereas on slice with low defect percentage (b) the variation is low. The red contour is the ground truth.

Figure 3. Three view aggregation. We aggregate, with median filter, the 3D mask obtained by stacking the 0.95-confidence mask of each view to achieve the final predicted mask. We apply post processing techniques including thresholding noisy slices and correcting for lung boundary using the 0.95-confidence mask from the coronal view.

Figure 4. Slice-wise and overall Dice score for aggregated mask, mask by using only the coronal view and human expert mask. All masks are produced by using only the ventilation imaging. Our aggregated method tends to match or even outperform the human expert on boundary slices. The ventilation defect percentage is calculated with the ground truth mask obtained by the proton images taken within the same breath-hold.

Figure 5. Selected slices of test patients. Three-view aggregated masks outperform the masks based only on the coronal view, except for (f) and (g) where the three-view aggregated mask is too big. On (a), (d) and (i), the three-view aggregated mask (green) outperforms the human expert.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)

4078

DOI: https://doi.org/10.58530/2022/4078