Rapid and accurate evaluation of stroke is imperative as currently available treatments are constrained by a narrow time window. Diffusion Weighted Magnetic Resonance (DWI) is a key imaging modality in stroke evaluation as it allows for assessment of the extent of acute ischemic brain injury. Nonetheless, manual delineation of stroke regions is expensive, time-consuming, and subject to inter-rater variability. In this study, we sought to develop a deep learning approach for ischemic stroke volumetric segmentation in a large clinical dataset of 1,205 patients from the NIH-funded Heart-Brain Interactions in Human Acute Ischemic Stroke Study utilizing only DWI imaging.
Introduction
Important decisions in stroke management currently rely on accurate estimation of the volume of acute ischemic brain injury.1 However, manual delineation of stroke regions is expensive, time-consuming, and subject to inter-rater variability.2 Furthermore, segmentation is a highly difficult task as there can be ill-defined boundaries in addition to variability in size and location. In this study, we develop a deep learning approach for ischemic stroke volumetric segmentation utilizing only DWI imaging validated on a large clinical dataset of 1,205 patients. Furthermore, we demonstrate that both manual and automatically segmented volumes can be used to predict clinical outcome at 90 days as measured by modified Rankin Scale (mRS).3,4 This may serve as a useful tool for prognostication as well as aiding clinical decision-making. In particular, patients with predicted poor outcome may be considered for alternative approaches for patient management.Methods
The study was conducted following approval by the Massachusetts General Hospital Institutional Review Board. Our patient cohort included 1,205 patients with DWI imaging (b-value = 0, b0, and 1000 s/mm², b1000) from the NIH-funded Heart-Brain Interactions in Human Acute Ischemic Stroke Study, recruited between June 2009 and December 2011. We utilized the 3D U-Net architecture, a network designed for fast and precise segmentation (Fig. 1A), implemented within the DeepNeuro framework.5,6 Because lesion segmentation is a particularly challenging problem, we experimented with several modifications of the base 3D U-Net architecture that have been shown to offer improvements in conventional classification problems. We evaluated the performance after including residual connections, inception modules, dense connections, and squeeze-and-excitation modules (Fig. 1).7–10 We also assessed the performance of ensembles (average of the output probabilities of multiple models) of the best performing modified U-Nets. We have made the code for pre-processing, neural network architectures, and training publicly available: https://github.com/QTIM-Lab/DeepNeuro.6 Performance of models was evaluated using the Sørensen–Dice coefficient and comparison between models were made using a paired t-test.Results
With the testing set, the median Dice coefficient between automatic and manual expert segmentations for the U-Net, Residual U-Net, Dense U-Net, and Squeeze-And-Excitation U-Net was 0.68, 0.678, 0.696, and 0.65, respectively. The best performing individual model was the Inception U-Net, which had a median Dice coefficient of 0.72 within the testing set. Notably, this improvement in performance was statistically significant compared with to the base U-Net (p < .05) on the testing set. Additionally, we assessed the performance of ensembling Inception U-Nets. The best performing ensemble was that of 4 Inception U-Nets. This performance was significantly better than that of a single base U-Net (p < .005) but not from a single Inception U-Net (Fig. 2). In comparing manually and automatically derived infarct volumes (from ensemble of 4 Inception U-Nets), the Spearman’s rho was 0.940 (p<.0001) in the testing set, respectively (Fig. 3).
For both the manually and automatically derived volumes, there was a statistically significant difference between patients who had good functional outcome (mRS score <= 2) and those who did not (mRS > 2, AUC = 0.758, p < .001). Similarly, for the both manually and automatically derived volumes, there was a statistically significant difference between patients who survived (mRS score <= 5) and those who were dead (mRS= 6) by 90-days (AUC = 0.778, p < .001) (Fig. 4).
Discussion
In this study, we demonstrate the utility of a fully-automated, deep learning algorithm for calculation of stroke volumes as part of a larger effort to apply such techniques to the field of neurology. We improve upon the conventional 3D U-Net architecture with the addition of inception modules and model ensembling that provides significantly improved performance in terms of median Dice coefficient on the testing set. In addition to segmentation performance, we also evaluated lesion volumes as derived from both manual segmentations and our automatic deep learning segmentations, which showed high agreement. Furthermore, we demonstrate that both manually segmented and automatically segmented volumes can predict 90-day functional outcome and survival, further supporting the clinical applicability of the automated segmentation tool.Conclusion
The fully-automatic pipeline for stroke segmentation disclosed in this study demonstrate the potential for deep learning based algorithms to aid clinical decision-making.1. Demaerschalk BM, Cheng NT, Kim AS. Intravenous Thrombolysis for Acute Ischemic Stroke Within 3 Hours Versus Between 3 and 4.5 Hours of Symptom Onset. The Neurohospitalist. 2015;5(3):101-109. doi:10.1177/1941874415583116.
2. Ay H, Arsava EM, Vangel M, et al. Interexaminer difference in infarct volume measurements on MRI: A source of variance in stroke research. Stroke. 2008;39(4):1171-1176. doi:10.1161/STROKEAHA.107.502104.
3. Arsava EM, Ballabio E, Benner T, et al. The Causative Classification of Stroke system: an international reliability and optimization study. Neurology. 2010;75(14):1277-1284. doi:10.1212/WNL.0b013e3181f612ce.
4. Wilson JTL, Hareendran A, Grant M, et al. Improving the assessment of outcomes in stroke: Use of a structured interview to assign grades on the modified Rankin Scale. Stroke. 2002;33(9):2243-2246. doi:10.1161/01.STR.0000027437.22450.BD.
5. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-net: Learning dense volumetric segmentation from sparse annotation. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol 9901 LNCS. ; 2016:424-432. doi:10.1007/978-3-319-46723-8_49.
6. Beers A, Brown J, Chang K, et al. DeepNeuro: an open-source deep learning toolbox for neuroimaging. August 2018. http://arxiv.org/abs/1808.04589. Accessed August 30, 2018.
7. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2016:770-778. doi:10.1109/CVPR.2016.90.
8. Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. February 2016. http://arxiv.org/abs/1602.07261. Accessed August 12, 2018.
9. Huang G, Liu Z, Weinberger KQ, van der Maaten L. Densely Connected Convolutional Networks. August 2016. http://arxiv.org/abs/1608.06993. Accessed March 1, 2017.
10. Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. September 2017. http://arxiv.org/abs/1709.01507. Accessed June 18, 2018.