4809

Volumetric Segmentation of Acute Brain Infarcts on Diffusion-Weighted Imaging using Deep Learning

Ken Chang¹, James Brown¹, Andrew L Beers¹, Katharina Hoebel¹, Jay Patel¹, Otto Rapalino², Bruce Rosen¹, Hakan Ay¹, and Jayashree Kalpathy-Cramer²

¹Radiology, Massachusetts General Hospital, Boston, MA, United States, ²Massachusetts General Hospital, Boston, MA, United States

Synopsis

Rapid and accurate evaluation of stroke is imperative as currently available treatments are constrained by a narrow time window. Diffusion Weighted Magnetic Resonance (DWI) is a key imaging modality in stroke evaluation as it allows for assessment of the extent of acute ischemic brain injury. Nonetheless, manual delineation of stroke regions is expensive, time-consuming, and subject to inter-rater variability. In this study, we sought to develop a deep learning approach for ischemic stroke volumetric segmentation in a large clinical dataset of 1,205 patients from the NIH-funded Heart-Brain Interactions in Human Acute Ischemic Stroke Study utilizing only DWI imaging.

Introduction

Important decisions in stroke management currently rely on accurate estimation of the volume of acute ischemic brain injury.¹ However, manual delineation of stroke regions is expensive, time-consuming, and subject to inter-rater variability.² Furthermore, segmentation is a highly difficult task as there can be ill-defined boundaries in addition to variability in size and location. In this study, we develop a deep learning approach for ischemic stroke volumetric segmentation utilizing only DWI imaging validated on a large clinical dataset of 1,205 patients. Furthermore, we demonstrate that both manual and automatically segmented volumes can be used to predict clinical outcome at 90 days as measured by modified Rankin Scale (mRS).^3,4This may serve as a useful tool for prognostication as well as aiding clinical decision-making. In particular, patients with predicted poor outcome may be considered for alternative approaches for patient management.

Methods

The study was conducted following approval by the Massachusetts General Hospital Institutional Review Board. Our patient cohort included 1,205 patients with DWI imaging (b-value = 0, b0, and 1000 s/mm², b1000) from the NIH-funded Heart-Brain Interactions in Human Acute Ischemic Stroke Study, recruited between June 2009 and December 2011. We utilized the 3D U-Net architecture, a network designed for fast and precise segmentation (Fig. 1A), implemented within the DeepNeuro framework.^5,6Because lesion segmentation is a particularly challenging problem, we experimented with several modifications of the base 3D U-Net architecture that have been shown to offer improvements in conventional classification problems. We evaluated the performance after including residual connections, inception modules, dense connections, and squeeze-and-excitation modules (Fig. 1).7–10 We also assessed the performance of ensembles (average of the output probabilities of multiple models) of the best performing modified U-Nets. We have made the code for pre-processing, neural network architectures, and training publicly available: https://github.com/QTIM-Lab/DeepNeuro.⁶Performance of models was evaluated using the Sørensen–Dice coefficient and comparison between models were made using a paired t-test.

Results

With the testing set, the median Dice coefficient between automatic and manual expert segmentations for the U-Net, Residual U-Net, Dense U-Net, and Squeeze-And-Excitation U-Net was 0.68, 0.678, 0.696, and 0.65, respectively. The best performing individual model was the Inception U-Net, which had a median Dice coefficient of 0.72 within the testing set. Notably, this improvement in performance was statistically significant compared with to the base U-Net (p < .05) on the testing set. Additionally, we assessed the performance of ensembling Inception U-Nets. The best performing ensemble was that of 4 Inception U-Nets. This performance was significantly better than that of a single base U-Net (p < .005) but not from a single Inception U-Net (Fig. 2). In comparing manually and automatically derived infarct volumes (from ensemble of 4 Inception U-Nets), the Spearman’s rho was 0.940 (p<.0001) in the testing set, respectively (Fig. 3).

For both the manually and automatically derived volumes, there was a statistically significant difference between patients who had good functional outcome (mRS score <= 2) and those who did not (mRS > 2, AUC = 0.758, p < .001). Similarly, for the both manually and automatically derived volumes, there was a statistically significant difference between patients who survived (mRS score <= 5) and those who were dead (mRS= 6) by 90-days (AUC = 0.778, p < .001) (Fig. 4).

Discussion

In this study, we demonstrate the utility of a fully-automated, deep learning algorithm for calculation of stroke volumes as part of a larger effort to apply such techniques to the field of neurology. We improve upon the conventional 3D U-Net architecture with the addition of inception modules and model ensembling that provides significantly improved performance in terms of median Dice coefficient on the testing set. In addition to segmentation performance, we also evaluated lesion volumes as derived from both manual segmentations and our automatic deep learning segmentations, which showed high agreement. Furthermore, we demonstrate that both manually segmented and automatically segmented volumes can predict 90-day functional outcome and survival, further supporting the clinical applicability of the automated segmentation tool.

Conclusion

The fully-automatic pipeline for stroke segmentation disclosed in this study demonstrate the potential for deep learning based algorithms to aid clinical decision-making.

Acknowledgements

This project was supported by a training grant from the NIH Blueprint for Neuroscience Research (T90DA022759/R90DA023427) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under award number 5T32EB1680 to K. Chang and J. Patel. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This publication was supported from the Martinos Scholars fund to K. Hoebel. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Martinos Scholars fund This study was supported by National Institutes of Health grants U01 CA154601, U24 CA180927, and U24 CA180918 to J. Kalpathy-Cramer. We would like to acknowledge the GPU computing resources provided by the MGH and BWH Center for Clinical Data Science. This research was carried out in whole or in part at the Athinoula A. Martinos Center for Biomedical Imaging at the Massachusetts General Hospital, using resources provided by the Center for Functional Neuroimaging Technologies, P41EB015896, a P41 Biotechnology Resource Grant supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institutes of Health.

References

1. Demaerschalk BM, Cheng NT, Kim AS. Intravenous Thrombolysis for Acute Ischemic Stroke Within 3 Hours Versus Between 3 and 4.5 Hours of Symptom Onset. The Neurohospitalist. 2015;5(3):101-109. doi:10.1177/1941874415583116.

2. Ay H, Arsava EM, Vangel M, et al. Interexaminer difference in infarct volume measurements on MRI: A source of variance in stroke research. Stroke. 2008;39(4):1171-1176. doi:10.1161/STROKEAHA.107.502104.

3. Arsava EM, Ballabio E, Benner T, et al. The Causative Classification of Stroke system: an international reliability and optimization study. Neurology. 2010;75(14):1277-1284. doi:10.1212/WNL.0b013e3181f612ce.

4. Wilson JTL, Hareendran A, Grant M, et al. Improving the assessment of outcomes in stroke: Use of a structured interview to assign grades on the modified Rankin Scale. Stroke. 2002;33(9):2243-2246. doi:10.1161/01.STR.0000027437.22450.BD.

5. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-net: Learning dense volumetric segmentation from sparse annotation. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol 9901 LNCS. ; 2016:424-432. doi:10.1007/978-3-319-46723-8_49.

6. Beers A, Brown J, Chang K, et al. DeepNeuro: an open-source deep learning toolbox for neuroimaging. August 2018. http://arxiv.org/abs/1808.04589. Accessed August 30, 2018.

7. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2016:770-778. doi:10.1109/CVPR.2016.90.

8. Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. February 2016. http://arxiv.org/abs/1602.07261. Accessed August 12, 2018.

9. Huang G, Liu Z, Weinberger KQ, van der Maaten L. Densely Connected Convolutional Networks. August 2016. http://arxiv.org/abs/1608.06993. Accessed March 1, 2017.

10. Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. September 2017. http://arxiv.org/abs/1709.01507. Accessed June 18, 2018.

Figures

Figure 1. (A) A 3D U-Net architecture was used ischemic stroke segmentation. The input is a patch from the DWI image and the output is a probability map. The 3D U-Net architecture was modified with (B) residual connections, (C) inception modules, (D) dense connections, and (E) squeeze-and-excitation modules.

Figure 2. Median dice similarity coefficient (95% Confidence Interval) of individual models and model ensembles of Inception U-Nets within the Training, Validation, and Testing Sets.

Figure 3. (A) Histogram of Dice Similarity Coefficients with the Testing Set for the Ensemble of 4 Inception U-Nets. (B) Scatter plot of manually vs automatically derived volumes. (C) Example of manual vs automatic segmentations.

Figure 4. Violin plots of manually derived and automatically derived volumes from the Ensemble of 4 Inception U-Nets for patients with mRS <= 2 vs > 2 (A, B) and for patients with mRS <= 5 vs >5 (C, D). ****p<.001

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

4809