4794

A pilot study on the application of explainable deep learning to ADC maps for predicting functional outcome of ischemic stroke patients
Esra Zihni1, Bryony L. McGarry1,2, Jen Guo3, Rani Gupta Sah3, George Tadros3, Philip A. Barber3, and John D. Kelleher1,4
1PRECISE4Q Predictive Modelling in Stroke, Technological University Dublin, Dublin, Ireland, 2School of Psychological Science, University of Bristol, Bristol, United Kingdom, 3Calgary Stroke Program, Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada, 4ADAPT Research Centre, ICE Research Institute, Technological University Dublin, Dublin, Ireland

Synopsis

Applying deep learning models to MRI scans of acute stroke patients to extract features indicative of functional outcome could assist a clinician’s treatment decisions. Here, we trained convolutional neural network models on ADC maps from hyper-acute ischemic stroke patients to predict 3-month mRS and used an interpretability technique to highlight regions in the ADC maps that were most important in the prediction of good and poor outcomes. Although the models had poor predictive power, the visual explanations supported our previous findings that predictions might be based not on ischemic regions, but on other relevant information inherent in the image.

Introduction

Multiparametric MRI is informative of stroke pathophysiology and can therefore aid treatment decisions.1 One way is to build deep learning models using whole MRI volumes acquired in the emergency setting that can predict the 3-month functional outcomes of stroke patients. ADC maps are especially informative in the hyperacute stage because they reveal ischemic tissue within minutes of onset. As ADC is a quantitative measure of diffusion, it is free from ‘T2 shine-through’, and so the vasogenic contribution evident in DWI scans is eliminated,2 thus exposing only cytotoxic oedema. We previously suggested that the quantitative nature and rich pathophysiological information in ADC maps, when combined with a powerful deep learning method specialized in processing multidimensional image data, i.e., convolutional neural networks (CNNs), would yield strong predictive power of functional outcome.3 Also, that the source of the prediction would be the ischemic region. When applied to the ISLES-2017 challenge data, 4,5 the CNN trained on ADC maps showed high predictive power for poor outcomes (3-month mRS 3–6). However, contrary to expectations, the class activation maps (CAMs), which highlighted regions in the images most important for predicting poor outcomes, showed the models focused on brain boundaries instead of ischemic regions. This suggests the predictions could be based on MRI artifacts or other relevant factors such as age, indicated by atrophied brains. In this study, we built CNN models, together with CAMs, on ADC maps from a different patient cohort to elucidate what information in ADC maps is predictive of stroke outcome.

Methods

The analysis included MRI scans and 3-month mRS scores from 18 ischemic stroke patients (< 5 hours to treatment: standard care, tPA, EVT, or tPA+EVT).6 ADC maps were computed using ANTONIA Software 7 from DTI acquired at 3T (Discovery 750, GE, SE-EPI sequence, 15 directions) and brain extracted. Ischemic VOIs were created by applying a threshold of 550x10-6mm2/s to ADC maps and retaining the largest cluster of voxels. To standardize the image size, ADC maps were downscaled to 256x256x27 using cubic spline interpolation and normalized to have voxel intensity values with zero mean and standard deviation of one. 3-month mRS score was dichotomized so that 0-2 indicates good outcome (negative class, n = 10) and 3-6 indicates poor outcome (positive class, n = 8). We modeled whole 3D volumes of the ADC maps to predict both outcomes at 3 months, using a 3D-CNN with two convolutional and max pooling layers followed by a dense layer and a softmax output. For regularization, we used L2 normalization on each layer and dropout on the dense layer. To assess the performance of models trained with this architecture and hyper-parameter set given in Figure 1, we conducted 3-fold cross-validation using stratified sampling to preserve outcome class ratios across each fold and the area under the receiver operator characteristics curve (AUROC) and F1 score metrics. A final model was trained on the full dataset using the same set of hyper-parameters. Gradient-based class activation mapping (grad-CAM)8 provided visual explanations of the final model’s decisions. Grad-CAM localizes regions of interest on each input image that lead to the prediction of a target class; which is the ground truth label corresponding to each patient.

Results

The training and validation performances in terms of mean AUROC and F1 score over the splits are given in Figure 2. The models had perfect predictive power on training data; however, they performed very poorly on validation data, suggesting that they did not generalize well to unseen examples. The regions of interest that drive model decisions for each patient are presented as heatmaps in Figures 3 and 4.

Discussion

The poor performance of the models on unseen data suggests that the regions of interest discovered through grad-CAM cannot be interpreted as indicative of good or bad outcomes but only as affecting, or driving, the predicted outcome. The CAMs showed that consistent with our previous study,3 the model decisions were not driven by ischemic regions. However, unlike our previous results, where the model consistently focused on the brain boundaries, here, the model also focused on the CSF and ventricles. This may be due to the difference in brain extraction methods in the two datasets. Nevertheless, these results support our previous hypothesis that the predictions could have been based on age, indicated by atrophied brains and/or enlarged ventricles.

Conclusion

Here we used a small cohort of patients to build upon our previous findings on what properties of ADC maps drive a model’s decision when predicting 3-month functional outcomes. Although our models did not achieve high predictive power, possibly due to limitations in cohort size and attenuated ADC values in the re-perfusion phase, we highlighted that the predictions might be based on other relevant information inherent in the image. Further research will focus on discovering the effects of age, ventricle size, and brain volume on the prediction task and their interactions with one another. Additionally, further analysis will involve testing the same CNN algorithm on other MRI scans from the same patients, including T2-weighted FLAIR and T2 relaxation time maps, to determine if the observed pattern in the CAMs is specific to ADC.

Acknowledgements

This work was supported by the Heart and Stroke Foundation of Canada Grant in Aid 2015/18, the PRECISE4Q Predictive Modelling in Stroke project (https://precise4q.eu) funded by the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 777107) and the ADAPT Research Centre, funded under the SFI Research Centres Programme (Grant 13/ RC/2106), co-funded under the European Regional Development Funds.

References

1. Kauppinen, R. A. Multiparametric magnetic resonance imaging of acute experimental brain ischaemia. Prog. Nucl. Magn. Reson. Spectrosc. 80, 12–25 (2014).

2. Le Bihan, D., Turner, R., Douek, P. & Patronas, N. Diffusion MR imaging: clinical applications. Am. J. Roentgenol. 159, 591–599 (1992).

3. Zihni, E., McGarry, B.& Kelleher, J. D. An Analysis of the Interpretability of Neural Networks trained on Magnetic Resonance Imaging for Stroke Outcome Prediction. (2021) doi:10.21427/DHBT-Q252.

4. Maier, O. et al. ISLES 2015 - A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med. Image Anal. 35, 250–269 (2017).

5. Kistler, M., Bonaretti, S., Pfahrer, M., Niklaus, R. & Büchler, P. The Virtual Skeleton Database: An Open Access Repository for Biomedical Research and Collaboration. J. Med. Internet Res. 15, e245 (2013).

6. Sah, R. G. et al. Temporal evolution and spatial distribution of quantitative T2 MRI following acute ischemia reperfusion injury. Int. J. Stroke 15, 495–506 (2020).

7. Forkert, N. D., Cheng, B., Kemmling, A., Thomalla, G. & Fiehler, J. ANTONIA Perfusion and Stroke: A Software Tool for the Multi-purpose Analysis of MR Perfusion-weighted Datasets and Quantitative Ischemic Stroke Assessment. Methods Inf. Med. 53, 469–481 (2014).

8. Selvaraju, R. R. et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. in 2017 IEEE International Conference on Computer Vision (ICCV) 618–626 (IEEE, 2017). doi:10.1109/ICCV.2017.74.

Figures

Figure 1. Model hyper-parameters used during training. The table shows the selected hyper-parameters for the trained models.

Figure 2. Performance scores. The overall predictive performance is presented by the mean and standard deviation of AUROC and F1 scores calculated over the 3 cross-validation splits.

Figure 3. Illustration of class activation maps for correctly predicted patients that had a poor outcome (3-month mRS 3-6). The figure shows, for each patient, a slice from the ADC map where the lesion is most visible with the lesion mask overlayed. The generated heatmap corresponding to the same slice is shown beside the original image. On the heatmaps, red areas indicate high interest, followed by yellow areas. The mRS score of each patient is given above their maps.

Figure 4. Illustration of class activation maps for correctly predicted patients that had a good outcome (3-month mRS 0-2). The figure shows, for each patient, a slice from the ADC map where the lesion is most visible with the lesion mask overlayed. The generated heatmap corresponding to the same slice is shown beside the original image. On the heatmaps, red areas indicate high interest, followed by yellow areas. The mRS score of each patient is given above their maps.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)
4794
DOI: https://doi.org/10.58530/2022/4794