0507

Explainable concept mappings underlying deep learning brain disease classification

Christian Tinauer¹, Maximilian Sackl¹, Anna Damulina¹, Reduan Achtibat², Maximilian Dreyer², Frederik Pahde², Sebastian Lapuschkin², Reinhold Schmidt¹, Stefan Ropele¹, Wojciech Samek^2,3,4, and Christian Langkammer¹
¹Medical University of Graz, Graz, Austria, ²Fraunhofer Heinrich Hertz Institute, Berlin, Germany, ³Technische Universität Berlin, Berlin, Germany, ⁴BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany

Synopsis

Keywords: Alzheimer's Disease, Relaxometry, xAI, Explainable, Deep Learning

Motivation: While recent studies show high accuracy in the classification of Alzheimer’s disease using deep neural networks, the underlying learned concepts have not been investigated.

Goal(s): To systematically identify the concepts learned by the deep neural network for model validation.

Approach: Using R2* maps we separated Alzheimer's patients (n=117) from healthy controls (n=219) by using a deep neural network and systematically investigated the learned concepts using Concept Relevance Propagation (CRP).

Results: In line with established histological findings, highly relevant concepts were primarily found in and adjacent to the basal ganglia.

Impact: The identification of concepts learned by deep neural networks for disease classification enables validation of the models and improves reliability.

Introduction

Deep neural networks (DNNs) can learn features from MRI for the classification of Alzheimer's disease (AD) ¹ but are generally seen as black boxes ². Medical applications are especially required to verify that the accuracy of those models is not the result of exploiting data artifacts. However, even when using conventional heat mapping approaches it remains unclear whether atrophy, signal intensity changes or other contributors are relevant for the T1-based separation of patients with AD from normal controls (NC) ³.
In this work, we utilized the effective relaxation rate R2* for AD classification by a relevance-regularized DNN. R2* is highly correlated with iron concentration in gray matter ⁴ and increased iron levels in the basal ganglia are a frequent finding in AD ⁵. Using Concept Relevance Propagation (CRP) ⁶ we investigated the concepts learned by the DNN and their contributions to the classification results.

Methodology

Subjects. We retrospectively selected 226 MRI datasets from 117 patients with probable AD (mean age=71.1±8.2 years, male/female=93/133) from our outpatient clinic and 226 MRIs from 219 propensity-logit-matched (covariates age, sex) ^7,8 healthy controls (mean age=69.6±9.3 years, m/f=101/125) from an ongoing community-dwelling study. Patients and controls were scanned using a consistent MRI protocol at 3 Tesla (Siemens TimTrio) including a T1-weighted MPRAGE sequence (1mm isotropic resolution) and a spoiled FLASH sequence (0.9x0.9x2mm³, TR/TE=35/4.92ms, 6 echoes, 4.92ms echo spacing). R2* maps were calculated voxelwise using a monoexponential model, and brain masks and image registrations were obtained using FSL tools ⁹.

Relevance-guided classification network. A classifier network with z⁺-rule ¹⁰ as relevance-guided extension, described in ³, was utilized. To ensure the same starting point for the network training, we created 3 network weights initializations. Network training with a given data sampling was repeated for all network weights initializations.

Training. We trained models on R2* maps in native space using Adam ¹¹ for 60 epochs with a batch size of 6. AD and NC data were separately sampled into training, validation, and test sets (ratio 70:15:15) while maintaining all scans from one person in the same set. To ensure the same class distribution in all setups, final sets were created by combining one set from each cohort. The sampling procedure was repeated 10 times to enable a bootstrapping analysis.

Bootstrapping analysis. For each data sampling (10 data samplings) and for each network weights initialization (3 weights initializations), we repeated the training of the network, creating overall 30 training sessions with the same input image configuration ¹². For concepts identification, we selected the best-performing run in terms of validation accuracy from the 30 training sessions.

Concepts identification. Extending backpropagation-based heat mapping methods, CRP ⁶ enables conditioning on a concept encoded by a hidden-layer channel. Hence, for this analysis we computed the concept-conditional explanations for all eight channels of the last down-convolutional layer before the fully connected layers (cf. Figure 1 for details). By applying RelMax ⁶, with the objective to maximize the relevance criterion, we created the concept map for each image in the test data set for each concept in descending order. Mean concept maps for the ten most relevant images per concept were computed, ranked by their overall relative contribution to the classification. All maps were computed w.r.t both output classes.

Results

The bootstrapping setup yielded a mean balanced accuracy and standard deviation of 75.64%±5.16%, sensitivity of 69.67%±9.55%, specificity of 81.61%±5.45% and an AUC of 0.76±0.05. The mean heat map from all test data and the mean of the ten most relevant concept maps of each of the four most relevant concepts for the best-performing bootstrapping run are shown in Figure 2.

Discussion and Conclusion

This study identified the concepts of the DNN-based classification with R2* maps. Various studies have shown that brain iron increasingly accumulates in the deep gray matter of AD patients ^13,14. While highest relevances in and adjacent to the basal ganglia are consistently involved in all concepts, concept mapping identified complementary spatial patterns (e.g. concept 1 and concept 3, Figure 2).

In conclusion, this study used quantitative MRI data (R2*) for deep learning classification and CRP in a clinical cohort of AD patients. Confirming histological and in-vivo iron mapping studies, this underlines that heat and concept mapping can serve as an exploratory means to identify areas of pathological tissue changes and further reveal internal mechanisms of deep learning classification networks.

Acknowledgements

This study was funded by the Austrian Science Fund (FWF grant numbers: P30134, P35887). This research was supported by NVIDIA GPU hardware grants.

References

1. Wen J, Thibeau-Sutre E, Diaz-Melo M, et al. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Med Image Anal. 2020;63:101694. doi:10.1016/j.media.2020.101694

2. Davatzikos C. Machine learning in neuroimaging: Progress and challenges. NeuroImage. 2019;197:652-656. doi:10.1016/j.neuroimage.2018.10.003

3. Tinauer C, Heber S, Pirpamer L, et al. Interpretable brain disease classification and relevance-guided deep learning. Sci Rep. 2022;12(1):20254. doi:10.1038/s41598-022-24541-7

4. Langkammer C, Krebs N, Goessler W, et al. Quantitative MR imaging of brain iron: a postmortem validation study. Radiology. 2010;257(2):455-462. doi:10.1148/radiol.10100495

5. Drayer BP. Imaging of the aging brain. Part II. Pathologic conditions. Radiology. 1988;166(3):797-806. doi:10.1148/radiology.166.3.3277248

6. Achtibat R, Dreyer M, Eisenbraun I, et al. From attribution maps to human-understandable explanations through Concept Relevance Propagation. Nat Mach Intell. 2023;5(9):1006-1019. doi:10.1038/s42256-023-00711-8

7. Kline A, Luo Y. PsmPy: A Package for Retrospective Cohort Matching in Python. Annu Int Conf IEEE Eng Med Biol Soc. 2022;2022:1354-1357. doi:10.1109/EMBC48229.2022.9871333

8. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41-55. doi:10.1093/biomet/70.1.41

9. Smith SM, Zhang Y, Jenkinson M, et al. Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. Neuroimage. 2002;17(1):479-489. doi:10.1006/nimg.2002.1040

10. Montavon G, Lapuschkin S, Binder A, Samek W, Müller KR. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition. 2017;65:211-222. doi:10.1016/j.patcog.2016.11.008

11. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. In: Bengio Y, LeCun Y, eds. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. ; 2015. Accessed November 10, 2021. http://arxiv.org/abs/1412.6980

12. Bouthillier X, Delaunay P, Bronzi M, et al. Accounting for Variance in Machine Learning Benchmarks. arXiv:210303098 [cs, stat]. Published online March 1, 2021. Accessed December 1, 2021. http://arxiv.org/abs/2103.03098

13. Lane DJR, Ayton S, Bush AI. Iron and Alzheimer’s Disease: An Update on Emerging Mechanisms. J Alzheimers Dis. 2018;64(s1):S379-S395. doi:10.3233/JAD-179944

14. Damulina A, Pirpamer L, Soellradl M, et al. Cross-sectional and Longitudinal Assessment of Brain Iron Level in Alzheimer Disease Using 3-T MRI. Radiology. 2020;296(3):619-626. doi:10.1148/radiol.2020192541

Figures

Schematic overview of conventional back-propagation heat mapping and concept-specific heat mapping. By conditioning on a concept encoded by a hidden-layer channel, Concept Relevance Propagation and RelMax allow to compute concept-conditional explanations and provide semantic meaning for latent model structures, disentangling the learned and identified image features.

(1) mean global heat map created using z⁺-rule overlaid on MNI152 template, windowed to the top 50% of relevance within. The same slices are shown for the mean of the most important concept maps (contribution in percentage) in rows (2) to (5), calculated from the ten most relevant example images for each concept.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

0507

DOI: https://doi.org/10.58530/2024/0507