1094

Estimating Model and Data Dependant Uncertainty for Synthetic-CTs obtained using Generative Adversarial Networks

Matt Hemsley^1,2, Brige Chugh³, Mark Ruschin³, Young Lee³, Chia-Lin Tseng⁴, Greg Stanisz^1,2, and Angus Lau^1,2

¹Medical Biophysics, University of Toronto, Toronto, ON, Canada, ²Physical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, Canada, ³Medical Physics, Sunnybrook Health Sciences Centre, Toronto, ON, Canada, ⁴Radiation Oncology, Sunnybrook Health Sciences Centre, Toronto, ON, Canada

Synopsis

The feasibility of using a neural network model to place uncertainty estimates on synthetic-CTs created with a generative adversarial network was investigated. Dropout-based variational inference was employed to account for the uncertainty on the trained model. The standard GAN loss function was also combined with an additional log-likelihood term, designed such that the network learns which regions of input data lead to highly variable output. On a dataset of n=105 brain patients, our results demonstrate that the predicted uncertainty can be interpreted as an upper bound on the true error with a confidence of approximately 95%.

Introduction

Robust electron density assignment using MR is crucial for real-time MR-only radiation treatment planning. Recently, conditional generative adversarial neural networks (cGANs) have been proposed for synthetic-CT (sCT) generation solely from MRI^[1,2]. While deep neural networks are useful for prediction, existing models do not provide confidence estimates, which leads to an inability to identify incorrect outputs or out-of-distribution inputs. In the context of sCT estimation, these uncertainties can be classified into two types 1) data dependent (DD) caused by incompleteness in input data (e.g. noise, absence of visual features) and 2) model dependent (MD) due to incompleteness in the trained model (e.g. type of implant not seen in training)^[3]. Uncertainty estimates would aid clinical sCT implementation by identifying failure cases where the automated output is likely incorrect and intervention (i.e. image re-acquisition or CT-based planning) is required. In this work, we investigated the feasibility of using a neural network model to place uncertainty estimates on cGAN generated synthetic-CTs.

Methods

3D T1w pre-Gd, T1w post-Gd, T2 FLAIR (Matrix size 480x480, FOV=240 mm, 215-250 slices, 1mm thickness) and planning CTs (Matrix size 512x512, FOV=450 mm, 215-250 slices, 1mm thickness) of 105 brain patients were retrospectively analyzed. The MR and CT images were rigidly registered, resampled to a voxel size of (0.8x0.8x0.8)mm³, and grouped for training (n=85) and testing (n=20). The “Pix2Pix” cGAN architecture^[4] was used. The model consists of two competing networks–(1) a generator (256x256 Unet) which generates candidate images based on a model distribution and (2) a discriminator (70x70 patchGAN) which discriminates between the candidate and ground truth images. The loss function is

$L = L_{cGAN}+L_{1}$ , where

$L_{cGAN}$ captures the effectiveness of the discriminator at classifying the image, and

$L_{1}=|Output-GroundTruth|$ is a data consistency term. DD uncertainty was estimated by modifying the

$L_{1}$ term to include the standard deviation,

$\sigma$ , between the output and ground truth,

$L_{1,Unc} = \frac{L_{1}}{\sigma}+ln(\sigma)$ . The modified network learns both (1) the corresponding CT image, and (2) which regions of the input image lead to highly variable output. MD uncertainty was modeled using Monte Carlo dropout sampling during testing to obtain a distribution of outputs, each created with a different configuration of neurons. Testing with dropout is an approximation of Bayesian variational inference^[3]. 50 sample sCTs corresponding to each MR slice with were generated, the standard deviation of the output distribution reflects the uncertainty of the model.

Results

Network training took approximately 12 hours. Once trained, a sCT brain volume with DD uncertainty could be generated in 7.5 seconds, and 150 seconds for both DD and MD uncertainty. Fig1 shows the T1w MR, sCT, traditional CT, and highlights a selected region in which the network has partially failed. Fig2 shows the absolute difference map of the traditional CT and sCT, DD and MD uncertainty predictions, and indicates that areas where the network failed are regarded as uncertain. Fig3 shows the relationship between total predicted uncertainty and the true sCT-CT error in various selected regions of output. The plots demonstrate predicted uncertainty can be interpreted as an upper bound on the true error with a confidence of approximately 95%. Fig4 shows the uncertainty heatmaps highlighting regions of spatial failure in an erroneous sCT prediction.

Discussion

We have demonstrated that sCT uncertainty estimates can be determined with minor adjustments to standard cGANs. MAE values obtained agree with previously published studies using similar networks, including studies on different anatomical sites^[1,2]. No extra computational time was required to estimate DD uncertainty. However, MD uncertainty estimation increased computational time due to the need to generate many candidate samples. Real-time use could be enabled by parallelization with multiple GPUs. Areas of high uncertainty are predominantly edges stemming primarily from registration errors in training data, and areas where MRI offers incomplete information such as regions where bone is adjacent to air deposits, inside the bone, and the immobilization mask. As the only constraints placed on the network are training with dropout (MD) and a data regularization term in the loss function (DD) this method is not specific to cGANs and can be applied to CNNs with no adversarial component. Future work will study the use of uncertainty estimates to identify failure cases requiring the re-acquisition of images or a fall back to CT-based planning, as well as integration into dose calculations.

Conclusion

This study demonstrated that sCT generation using a cGAN modified for uncertainty estimation to identify spatial regions of network failure was as robust for real-time MR-only radiation treatment planning as previous studies using traditional cGANs. The new uncertainty estimates are anticipated to enable robust sCT generation in clinical applications.

Acknowledgements

The authors acknowledge funding from NSERC and Nvidia.

References

1. Maspero, M., Savenije, M. H., Dinkla, A. M., Seevinck, P. R., Intven, M. P., Jurgenliemk-Schulz, I. M., . . . Cornelis A T Van Den Berg. (2018). Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy. Physics in Medicine & Biology, 63(18), 185001. doi:10.1088/1361-6560/aada6d

2. Emami, H., Dong, M., Nejad-Davarani, S. P., & Glide-Hurst, C. K. (2018). Generating synthetic CTs from magnetic resonance images using generative adversarial networks. Medical Physics, 45(8), 3627-3636. doi:10.1002/mp.13047

3. Kendall, A., Gal, Y. (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?. arXiv:1703.04977 [cs.CV]

4. Isola, P., Zhu, JY., Zhou T., Efros, A. (2016) Image-to-Image Translation with Conditional Adversarial Networks. arXiv:1611.07004 [cs.CV]

Figures

Fig 1. sCT model outputs. Sample slices of A) T1w MRI, B) Planning CT, C) sCT. In the selected region, the nasal cavity was incorrectly represented, and bone was superficially imposed in the optic nerve region. The immobilization mask (found in the CT but not MRI) was also erroneously generated. Mean absolute error (MAE) between the sCT and real CT volumes in the validation set was found to be 93±10 HU.

Fig 2. Uncertainty estimates. A) True observed sCT-CT error, B) DD uncertainty map obtained by training with an additional log-uncertainty regression term, C) MD uncertainty map determined using Monte Carlo dropout sampling to obtain a distribution of outputs. The selected region where the sCT was qualitatively incorrect was depicted as highly uncertain, as were tissue interfaces, and the immobilization mask location. Mean predicted standard deviation (DD+MD) of the brain volumes in the dataset was found to be

${\sigma}$ =158±23 HU.

Fig 3. Total uncertainty predicted by the network (DD + MD) is plotted against true error between the sCT and traditional CT for individual voxel values from various selected regions. Generally, in this image as well as the other images in the validation set, 95% of points lie above the line y=x shown in red, demonstrating the determined uncertainty effectively serves as an upper bound on the true observed error.

Fig 4. Example of an sCT failure mode. The network incorrectly predicted the voxel values of several teeth in the highlighted region and was affected by the metallic susceptibility artifact caused by the dental implant. The regions of spatial failure in the sCT were regarded as highly uncertain by both model and data dependent uncertainty estimates.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

1094