Esra Zihni1, Bryony L. McGarry1,2, Jen Guo3, Rani Gupta Sah3, George Tadros3, Philip A. Barber3, and John D. Kelleher1,4
1PRECISE4Q Predictive Modelling in Stroke, Technological University Dublin, Dublin, Ireland, 2School of Psychological Science, University of Bristol, Bristol, United Kingdom, 3Calgary Stroke Program, Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada, 4ADAPT Research Centre, ICE Research Institute, Technological University Dublin, Dublin, Ireland
Synopsis
Applying deep learning models to MRI scans of
acute stroke patients to extract features indicative of functional outcome
could assist a clinician’s treatment decisions. Here, we trained convolutional
neural network models on ADC maps from hyper-acute ischemic stroke patients to
predict 3-month mRS and used an interpretability technique to highlight regions
in the ADC maps that were most important in the prediction of good and poor
outcomes. Although the models had poor predictive power, the visual explanations
supported our previous findings that predictions
might be based not on ischemic regions, but on other relevant information
inherent in the image.
Introduction
Multiparametric MRI is
informative of stroke pathophysiology and can therefore aid treatment decisions.1 One way is to build deep
learning models using whole MRI volumes acquired in the emergency setting that
can predict the 3-month functional outcomes of stroke patients. ADC maps are
especially informative in the hyperacute stage because they reveal ischemic
tissue within minutes of onset. As ADC is a quantitative measure of diffusion, it
is free from ‘T2 shine-through’, and so the vasogenic contribution
evident in DWI scans is eliminated,2 thus exposing only cytotoxic oedema. We previously
suggested that the quantitative nature and rich pathophysiological information in
ADC maps, when combined with a powerful deep learning method specialized in
processing multidimensional image data, i.e., convolutional neural networks
(CNNs), would yield strong predictive power of functional outcome.3 Also, that the source of the
prediction would be the ischemic region. When applied to the ISLES-2017
challenge data, 4,5 the CNN trained on ADC maps showed
high predictive power for poor outcomes (3-month mRS 3–6). However, contrary to
expectations, the class activation maps (CAMs), which highlighted regions in
the images most important for predicting poor outcomes, showed the models focused
on brain boundaries instead of ischemic regions. This suggests the predictions
could be based on MRI artifacts or other relevant factors such as age, indicated
by atrophied brains. In this study, we built CNN models, together with CAMs, on
ADC maps from a different patient cohort to elucidate what information in ADC
maps is predictive of stroke outcome.Methods
The analysis included MRI scans
and 3-month mRS scores from 18 ischemic stroke patients (< 5 hours to treatment:
standard care, tPA, EVT, or tPA+EVT).6 ADC maps were computed using ANTONIA
Software 7 from DTI acquired at 3T (Discovery
750, GE, SE-EPI sequence, 15 directions) and brain extracted. Ischemic VOIs were
created by applying a threshold of 550x10-6mm2/s to ADC
maps and retaining the largest cluster of voxels. To standardize the image
size, ADC maps were downscaled to 256x256x27 using
cubic spline interpolation and normalized to have voxel intensity values with
zero mean and standard deviation of one. 3-month mRS score was dichotomized so that 0-2 indicates good
outcome (negative class, n = 10) and 3-6 indicates poor outcome (positive class,
n = 8). We modeled whole 3D volumes of the ADC maps to predict both outcomes at
3 months, using a 3D-CNN with two convolutional and max pooling layers followed
by a dense layer and a softmax output. For regularization, we used L2 normalization
on each layer and dropout on the dense layer. To assess the performance of
models trained with this architecture and hyper-parameter set given in Figure 1, we
conducted 3-fold cross-validation using stratified sampling to preserve outcome
class ratios across each fold and the area under the receiver operator
characteristics curve (AUROC) and F1 score metrics. A final model was trained
on the full dataset using the same set of hyper-parameters. Gradient-based class
activation mapping (grad-CAM)8 provided visual explanations of
the final model’s decisions. Grad-CAM localizes regions of
interest on each input image that lead to the prediction of a target class; which is the ground truth label corresponding to each patient.Results
The
training and validation performances in terms of mean AUROC and F1 score over
the splits are given in Figure 2. The models had perfect predictive power on
training data; however, they performed very poorly on validation data,
suggesting that they did not generalize well to unseen examples. The regions of
interest that drive model decisions for each patient are presented as heatmaps
in Figures 3 and 4.Discussion
The poor performance of the
models on unseen data suggests that the regions of interest discovered through grad-CAM
cannot be interpreted as indicative of good or bad outcomes but only as affecting,
or driving, the predicted outcome. The CAMs showed that consistent with our
previous study,3 the model decisions were not driven by ischemic
regions. However, unlike our previous results, where the model consistently
focused on the brain boundaries, here, the model also focused on the CSF and
ventricles. This may be due to the difference in brain extraction methods in
the two datasets. Nevertheless, these results support our previous hypothesis
that the predictions could have been based on age, indicated by atrophied
brains and/or enlarged ventricles.Conclusion
Here we
used a small cohort of patients to build upon our previous findings on what
properties of ADC maps drive a model’s decision when predicting 3-month functional
outcomes. Although our models did not achieve high predictive power, possibly
due to limitations in cohort size and attenuated ADC values in the
re-perfusion phase, we highlighted that the predictions
might be based on other relevant information inherent in the image. Further
research will focus on discovering the effects of age, ventricle size, and
brain volume on the prediction task and their interactions with one another.
Additionally, further analysis will involve testing the same CNN
algorithm on other MRI scans from the same patients,
including T2-weighted FLAIR and T2 relaxation time maps, to
determine if the observed pattern in the CAMs is specific to ADC.Acknowledgements
This work was supported by the Heart and Stroke Foundation of Canada Grant in Aid 2015/18,
the PRECISE4Q Predictive Modelling in Stroke project
(https://precise4q.eu) funded by the European Union’s Horizon 2020 research and
innovation programme (grant agreement No. 777107) and the ADAPT Research
Centre, funded under the SFI Research Centres Programme (Grant 13/ RC/2106),
co-funded under the European Regional Development Funds.References
1. Kauppinen, R. A. Multiparametric
magnetic resonance imaging of acute experimental brain ischaemia. Prog.
Nucl. Magn. Reson. Spectrosc. 80, 12–25 (2014).
2. Le Bihan, D., Turner, R., Douek, P. & Patronas, N.
Diffusion MR imaging: clinical applications. Am. J. Roentgenol. 159,
591–599 (1992).
3. Zihni, E., McGarry, B.& Kelleher, J. D. An Analysis of
the Interpretability of Neural Networks trained on Magnetic Resonance Imaging
for Stroke Outcome Prediction. (2021) doi:10.21427/DHBT-Q252.
4. Maier, O. et al. ISLES 2015 - A public evaluation
benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med.
Image Anal. 35, 250–269 (2017).
5. Kistler, M., Bonaretti, S., Pfahrer, M., Niklaus, R. &
Büchler, P. The Virtual Skeleton Database: An Open Access Repository for
Biomedical Research and Collaboration. J. Med. Internet Res. 15, e245 (2013).
6. Sah,
R. G. et al. Temporal evolution and spatial distribution of
quantitative T2 MRI following acute ischemia reperfusion injury. Int. J.
Stroke 15, 495–506 (2020).
7. Forkert, N. D., Cheng, B., Kemmling, A., Thomalla, G. &
Fiehler, J. ANTONIA Perfusion and Stroke: A Software Tool for the Multi-purpose
Analysis of MR Perfusion-weighted Datasets and Quantitative Ischemic Stroke
Assessment. Methods Inf. Med. 53, 469–481 (2014).
8. Selvaraju, R. R. et al. Grad-CAM: Visual Explanations
from Deep Networks via Gradient-Based Localization. in 2017 IEEE
International Conference on Computer Vision (ICCV) 618–626 (IEEE, 2017).
doi:10.1109/ICCV.2017.74.