4258

Deep learning based synthetic CT skull for transcranial MRgFUS interventions using 3D V-net–Transfer learning implications

Pan Su^1,2, Sijia Guo^2,3, Steven Roys^2,3, Florian Maier⁴, Thomas Benkert⁴, Himanshu Bhat¹, Elias R. Melhem², Dheeraj Gandhi², Rao P. Gullapalli^2,3, and Jiachen Zhuo^2,3
¹Siemens Medical Solutions USA, Inc., Malvern, PA, United States, ²Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland, School of Medicine, Baltimore, MD, United States, ³Center for Metabolic Imaging and Therapeutics (CMIT), University of Maryland Medical Center, Baltimore, MD, United States, ⁴Siemens Healthcare GmbH, Erlangen, Germany

Synopsis

Transcranial MRI-guided focused ultrasound (tcMRgFUS) is a promising technique for treating multiple diseases. It is desirable to simplify the clinical workflow of tcMRgFUS treatment planning. Previously, feasibility of leveraging deep learning to generate synthetic CT skull from ultra-short echo time (UTE) MRI has been demonstrated for tcMRgFUS planning. In this study, 3D V-Net was used for skull estimation, by taking advantage of 3D volumetric images. Furthermore, feasibility of applying pre-trained model in new dataset was studied, demonstrating the possibility of generalization across various sequences/protocols and scanners.

INTRODUCTION

Transcranial MRI-guided focused ultrasound (tcMRgFUS) is a promising novel technique for treating multiple disorders and diseases^1-3. tcMRgFUS planning requires both a CT scan for skull density estimation and treatment planning simulation, and an MRI for target identification. It is desirable to simplify the clinical workflow of tcMRgFUS treatment planning. Previously, feasibility of leveraging deep learning to generate synthetic CT skull from ultra-short echo time (UTE) MRI has been demonstrated for tcMRgFUS planning⁴. However, the limitation of 2D U-Net⁵ is that it lacks contextual information during training, and can potentially lead to non-ideal estimation of the skull. The purpose of this study is first to leverage the 3D V-Net⁶ for skull estimation, by taking advantage of 3D volumetric images. Furthermore, transfer learning of the trained model on new dataset was studied to demonstrate the feasibility of applying the pre-trained model in dataset acquired using different MRI/CT sequence and protocols.

METHODS

Image acquisition and data preprocessing
This study was approved by local IRB. Data was obtained from 42 subjects (65.7±11.5yo, 16F). MR images were acquired on a 3T system (MAGNETOM Trio, Siemens Healthcare, Erlangen, Germany). A prototype 3D radial UTE sequence^7,8 was acquired: TE1/TE2=0.07ms/4ms, resolution=1.3x1.3x1.3mm³, TA=5min. CT images were acquired using a CT scanner (Brilliance 64, Philips, WA), resolution=0.48x0.48x1mm³. Both UTE and CT images were coregistered and resampled to MPRAGE space (resolution=1.0x1.0x1.0mm³). Preprocessing of UTE and CT images was described in Ref⁴.

3D V-Net
3D V-Net⁶ was initially proposed to perform volumetric segmentation in medical images by considering the whole volume content at once. A diagram of the model used in this study is illustrated in Figure 1. Dual-echo UTE images were used as input to the neural network, and reference CT skull images were used as the prediction target. UTE-CT image pairs from 30 subjects were used for training, 10 for validation of hyper-parameter tuning, and 2 for prospective testing. Network training was performed with Tensorflow: loss=Mean-Absolute-Error (MAE), ADAM, learning-rate=0.0001, PReLU activation, convolution filter size=3x3x3; sliding-window extraction of patches (size=192x192x64) was used for baseline training, and randomly centered patches extracted from training dataset were used for data augmentation.

Evaluation of model performance
Performance of this model was evaluated using the following four metrics to compare synthetic CT skull with reference CT skull: 1) dice-coefficient of skull masks; 2) voxel-wise correlation-coefficient; 3) average of voxel-wise absolute differences; and 4) skull-density-ratio (SDR)^9,10 from 1024 locations.

Transfer learning for 0.8mm³ isotropic spiral UTE acquisition
The model trained above was based on a specific dual-echo radial UTE sequence with 1.3mm³ isotropic spatial resolution. It is desirable to have this trained model generalize to data acquired from different UTE sequences and protocols. To demonstrate the capability of transfer learning, data from two more subjects (20/37yo, F/M) scanned with a different prototype 3D UTE stack-of-spirals high resolution sequence¹¹ on a different 3T MRI system (MAGNETOM Prisma Fit, Siemens Healthcare, Erlangen, Germany): spatial resolution=0.8x0.8x0.8mm³, TE1/TE2=0.05/3.10ms, TA=4:29min. CT images were also acquired using a different CT scanner (SOMATOM Force, Siemens Healthcare, Erlangen, Germany), spatial resolution=0.46x0.46x1mm³. Data from the first subject was used for retraining the V-Net; and data from the second subject was used for testing. For comparison, V-Net previously trained on 30 participants was directly applied to data from the second subject without any retraining.

RESULTS and DISCUSSION

Figure 2 summarizes various metrics estimating the performance of the V-Net on 10 validation datasets and 2 prospective testing datasets. High spatial correlation, low mean-absolute-of-error, and high DICE score can be observed in the validation dataset. Two prospective testing datasets show comparable results to the validation datasets, suggesting that the model was not over-fitted to the training/validation data.

Figure 3 demonstrates the skull-density-ratio results from synthetic CT skulls. Figure 3a shows the positions of 1024 transducers with respect to the skull from a prospective testing dataset. Figure 3b compares the regional SDR values between reference CT skull and synthetic CT skull: minimal spatial discrepancy can be observed between these two. Figure 3c shows the correlation between the global SDR values between the CT skull and the DL synthetic skull (R²=0.9985).

Figure 4 shows the 3D cinematic rendering of the reference CT skull and synthetic CT skull from a representative subject. The two renderings are very similar, except in the lower maxillary bone and teeth. This is likely due to lack of training as current study focuses on MRgFUS application.

Figure 5 shows the results of applying the V-Net model (trained on radial UTE) to dual-echo high-spatial-resolution (0.8mm³ isotropic) spiral UTE data acquired on a different 3T scanner. Without any retraining, the model can approximately generalize to the new data it has never seen before (Figure 5d). However, some artifacts appear (yellow arrows), which are significantly reduced by retraining the model with a single high-spatial-resolution spiral UTE dataset (Figure 5e). Zoom-in window shows the improvement of retraining on delineation of fine details in the posterior skull.

CONCLUSION

Deep learning can be utilized to generate synthetic CT skull, thereby simplifying workflow of tcMRgFUS. 3D V-Net was utilized to take advantage of contextual information in volumetric images. Feasibility of applying pre-trained model in new dataset was studied, demonstrating the possibility of generalization across various sequences/protocols and scanners.

Acknowledgements

No acknowledgement found.

References

1. Elias WJ, Huss D, Voss T et al. A pilot study of focused ultrasound thalamotomy for essential tremor. N Engl J Med 2013;369:640-648.

2. Jeanmonod D, Werner B, Morel A, Michels L, Zadicario E, Schiff G, Martin E. Transcranial magnetic resonance imaging-guided focused ultrasound: noninvasive central lateral thalamotomy for chronic neuropathic pain. Neurosurg Focus 2012;32:E1.

3. Monteith S, Sheehan J, Medel R, Wintermark M, Eames M, Snell J, Kassell NF, Elias WJ. Potential intracranial applications of magnetic resonance-guided focused ultrasound surgery. J Neurosurg 2013;118:215-221.

4. Su P, Guo S, Roys F, Maier F, Bhat H, Melhem, ER, Gandhi D, Gullapalli R, Zhuo J. Transcranial MR Imaging–Guided Focused Ultrasound Interventions Using Deep Learning Synthesized CT. American Journal of Neuroradiology, 2020; 41(10), 1841-1848.

5. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention Cham, Switzerland: Springer; 2015;234–241.

6. Milletari F, Navab N, Ahmadi SA. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In2016 fourth international conference on 3D vision (3DV) 2016 Oct 25 (pp. 565-571). IEEE.

7. Speier P, Trautwein F. Robust radial imaging with predetermined isotropic gradient delay correction. Proc Int Soc Mag Reson Med 2006:2379.

8. Speier P, Trautwein F. A calibration for radial imaging with large inplane shifts. Proc Int Soc Mag Reson Med 2005:2295.

9. Boutet A, Gwun D, Gramer R, Ranjan M, Elias GJ, Tilden D, Huang Y, Li SX, Davidson B, Lu H, Tyrrell P. The relevance of skull density ratio in selecting candidates for transcranial MR-guided focused ultrasound. Journal of Neurosurgery. 2019 May 3;132(6):1785-91.

10. D’Souza M, Chen KS, Rosenberg J, Elias WJ, Eisenberg HM, Gwinn R, Taira T, Chang JW, Lipsman N, Krishna V, Igase K. Impact of skull density ratio on efficacy and safety of magnetic resonance–guided focused ultrasound treatment of essential tremor. Journal of Neurosurgery. 2019 Apr 26;132(5):1392-7.

11. Mugler JP, Fielden S, Meyer CH, Altes TA, Miller GW, Stemmer A, Pfeuffer J, Kiefer B. Breath-hold UTE Lung Imaging using a Stack-of-Spirals Acquisition. Proc Int Soc Mag Reson Med 2015:1476.

Figures

Figure 1: Schema of the employed deep learning architecture based on V-Net convolution neural network (CNN) consisting of encoding and decoding pathways. Dual-echo UTE images were used as the input for the network. Reference CT skull was segmented from reference CT and was used as the predication target. The difference between output of the network, DL synthetic CT skull, and reference CT skull was minimized.

Figure 2: Various metrics (mean ± standard deviation) showing performance of deep learning model from 10 validation datasets and 2 prospective testing datasets.

Figure 3: Skull Density Ratio (SDR) (min HU/max HU) results from synthetic CT skull: a) diagram of 1024 transducers on a half-sphere outside of the skull; b) regional SDR maps from reference CT skull, deep learning CT skull, and relative difference between two based on the entries of the 1024 ultrasound beams; c) relationship between global SDR values determined from reference CT and DL synthetic CT from all validation and testing (N=12). Each dot represents one subject.

Figure 4: 3D cinematic rendering of the reference CT skull and synthetic CT skull from a representative subject.

Figure 5: Transfer learning for 0.8mm³ isotropic synthetic CT skull: results of applying the trained model above to dual echo spiral UTE data (0.8mm³ isotropic) acquired on different 3T scanner. ab) dual echo spiral UTE; c) reference CT skull; d) results of directly applying the previous model trained on radial UTE without any retraining of the spiral UTE data; e) results of applying the model after retraining with single spiral UTE data. Bottom are zoom-in windows showing the details in posterior skulls. Note that all CTs here are 0.8mm³ isotropic, as CT were coregistered to UTE space.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

4258