Lavanya Umapathy1, Gloria Guzman Perez-Carillo2, Blair Winegar3, Srinivasan Vedantham4, Maria Altbach4, and Ali Bilgin1,4,5
1Electrical and Computer Engineering, University of Arizona, Tucson, AZ, United States, 2Mallinckrodt Institute of Radiology, St Louis, MO, United States, 3Radiology and Imaging Sciences, University of Utah, Salt Lake, UT, United States, 4Medical Imaging, University of Arizona, Tucson, AZ, United States, 5Biomedical Engineering, University of Arizona, Tucson, AZ, United States
Synopsis
The relationship between cognition and white matter hyperintensities (WMH) volumes often depends on accuracy of the lesion segmentation algorithm used. As such, accurate detection and quantification of WMH is of great interest. Here, we use a deep learning-based WMH segmentation algorithm, StackGen-Net, to detect and quantify WMH on 3D-FLAIR images from ADNI. We used a subset of subjects (n=20) and obtained manual WMH segmentations by an experienced neuro-radiologist to demonstrate the accuracy of our algorithm. On a larger cohort of subjects (n=290), we observed larger WMH volumes correlated with worse performance on executive function (P=.004), memory (P=.01), and language (P=.005).
Purpose
White matter hyperintensities (WMH) are brain white matter lesions that appear bright in Fluid Attenuated Inversion Recovery (FLAIR) MR images1. The extent of WMH lesion burden is associated with degeneration of axons and myelin and is of clinical relevance in aging and age-related neurological disorders2,3. Increased WMH burden has been associated with a decline in cognitive factors such as executive function, memory, and language. Accurate detection and quantification of WMH and studying the temporal correlation between lesion burden and disease progression is of interest in the neuroimaging community.
In this work, we use StackGen-Net4,5, a fast and automated deep learning-based WMH segmentation algorithm, to detect and quantify WMH on isotropic resolution 3D-FLAIR images. We use 3D-FLAIR images from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) repository to demonstrate the accuracy of our segmentation algorithm compared to WMH volumes made available by ADNI (n=20). We also evaluate the clinical value of our algorithm by assessing the impact of WMH volumes on executive function, memory, and language on a group of ADNI subjects (n=290) diagnosed as cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer’s Disease (AD).Methods
We recently proposed StackGen-Net, a stacked generalization ensemble of 3D Convolutional Neural Networks, that improved segmentation of WMH in 3D-FLAIR images compared to some state-of-the-art CNN segmentation frameworks4,5. StackGen-Net (Figure 1) consists of three orthogonal 3D CNNs each trained on axial, sagittal, and coronal reformatting of the 3D-FLAIRs. Stacked generalization maximizes the overall generalization accuracy by deducing the bias rate of individual CNNs in the ensemble. The Meta-CNN learns a new functional mapping from WMH predictions of the orthogonal CNNs to the final WMH prediction. Trained on a cohort of subjects with a history of vascular disease (n=30), StackGen-Net segment WMHs in a time-efficient manner with performance comparable to human inter-observer variability5.
Baseline data from 290 subjects imaged with the ADNI-3 protocol (with 1mm isotropic 3D FLAIR volumes) were downloaded from ADNI website (www.adni.loni.usc.edu). This included demographics, diagnosis, education, WMH volume estimates, and composite scores for executive function (ADNI-EF), memory (ADNI-MEM), and language (ADNI-LAN) from the neuropsychological test battery6. The WMH lesion volumes from ADNI (ADNI-WMH) were quantified using a histogram-based technique7.
The 3D FLAIR volumes from 290 subjects (193 CN, 73 MCI, and 24 AD) were brain-extracted and bias-corrected, followed by WMH detection and volume estimation by StackGen-Net (StackGen-Net-WMH). The total prediction time for a pre-processed 3D-FLAIR volume (256x256x240) was 40 seconds. An experienced neuro-radiologist manually annotated WMH on 3D-FLAIR volumes of 20 non-demented participants selected randomly from this ADNI cohort. These annotations were used to evaluate the segmentation performance of our algorithm using metrics such as Dice score (pixel and lesion), precision (lesion), recall (lesion), average volume difference (%), and area under the precision-recall curve.
We also used Bland-Altman (BA) analysis to compare manual WMH volume estimated with StackGen-Net-WMH and ADNI-WMH. The agreement was evaluated using R2, Coefficient of Variation (CV), and repeatability coefficient (RPC) statistics. Two-sided paired t-tests were used to assess if there were any significant WMH volume differences between ground truth annotations and the two segmentation algorithms.
The associations of WMH volumes with ADNI-EF, ADNI-MEM, and ADNI-LAN were explored using multiple-linear regression models after adjusting for age, intercranial volume, sex, education level, APOE4 allele genotype, and diagnosis. Results
Compared to manual annotations, StackGen-Net achieved average Dice score (pixel), Dice score (lesion), absolute volume difference, and area under precision-recall curve of 0.76 ± 0.09, 0.73 ± 0.11, 13.7% ± 9.7%, and 0.84 ± 0.10, respectively (n=20). Figure 2 shows WMH volume predictions on multi-planar images from a test ADNI 3D-FLAIR volume. We see that the WMH predictions from StackGen-Net agree well with manual annotations by an expert neuro-radiologist.
BA analysis (Figure 3) showed excellent agreement between WMH volumes estimated from ground-truth annotations and StackGen-Net-WMH (R2=0.98, CV=17%, RPC=33%) with a tighter limit of agreement. There were no significant differences between the WMH volumes from StackGen-Net and ground truth (P=0.47, n=20, two-sided paired t-test). In contrast, ADNI-WMH showed a significant difference (P=.01) in WMH volumes (R2=0.91, CV=59%, RPC=100%) when compared to ground truth.
Bigger volumes of WMH (StackGen-Net-WMH) correlated with worse performance on ADNI-EF (P=.004), ADNI-MEM (P=.01), and ADNI-LAN (P=.005). Similar, significant but less pronounced, effects were also observed using ADNI-WMH for ADNI-EF (P=.01), ADNI-MEM (P=.016), and ADNI-LAN (P=.016).Discussion
Many works using WMH volumes quantification on 2D-FLAIR images from ADNI have explored the relationship between WMH volumes and cognitive functions. The extent of the relationship (or lack thereof) often depends on the accuracy of the segmentation algorithm8. In this work, we used a subset of 3D-FLAIR volumes (n=20) from ADNI and obtained manual segmentations of WMHs by an experienced neuro-radiologist to demonstrate the accuracy of our WMH segmentation algorithm. We also demonstrated, using a larger cohort of 3D-FLAIR images (n=290) that larger WMH volumes correlate with significantly worse performance on executive function, memory, and language tasks, thereby affecting cognition. The analyses using WMH volumes from ADNI also agreed with the associations observed in this work. Conclusion
The use of a stacked generalization of CNN models can provide fast and accurate quantitative evaluation of WMHs to study association between WMH volumes and cognitive decline. Acknowledgements
- Arizona Health Sciences Center Translational Imaging Program Project Stimulus
- BIO5 Team Scholar’s Program
- Arizona Alzheimer’s Consortium
- Alzheimer’s Disease Neuroimaging Initiative
References
1. Wardlaw JM, Pantoni L, Pantoni L, Gorelick PB. Sporadic small vessel disease: pathogenic aspects. In: Cerebral Small Vessel Disease. Cambridge: Cambridge University Press; 2014:52-63. doi:10.1017/CBO9781139382694.007
2. Maniega SM, Valdés Hernández MC, Clayden JD, et al. White matter hyperintensities and normal-appearing white matter integrity in the aging brain. Neurobiol Aging. 2015;36(2):909-918. doi:10.1016/j.neurobiolaging.2014.07.048
3. Brickman AM, Meier IB, Korgaonkar MS, et al. Testing the white matter retrogenesis hypothesis of cognitive aging. Neurobiol Aging. 2012;33(8):1699-1715. doi:10.1016/j.neurobiolaging.2011.06.001
4. Umapathy L, Guzman Perez-Carillo G, Keerthivasan MB, et al. A Stacked Generalization of 3D Orthogonal Deep Learning Convolutional Neural Networks for Improved Detection of White Matter Hyperintensities in 3D FLAIR Images. in Press. AJ Neurol. 2021.
5. Umapathy L, Guzman Perez-Carillo G, Keerthivasan MB, et al. StackGen-Net: A Stacked Generalization of 3D Orthogonal Convolutional Neural Networks for Improved Detection of White Matter Hyperintensities. proc. ISMRM 2020
6. Gibbons LE, Carle AC, Mackin RS, et al. A composite score for executive functioning, validated in Alzheimer's Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment. Brain Imaging Behav. 2012;6(4):517-527. doi:10.1007/s11682-012-9176-1
7. DeCarli, C., Murphy, D. G., Tranh, M., Grady, C. L., Haxby, J. V, Gillette, J. a, … Rapoport, S. I. The effect of white matter hyperintensity volume on brain structure, cognitive performance, and cerebral metabolism of glucose in 51 healthy adults. Neurology. 1995. 45 (11), 2077–84.
8. Tubi, M. A., Feingold, F. W., Kothapalli, D., Hare, E. T., King, K. S., Thompson, P. M., ... & Alzheimer’s Disease Neuroimaging Initiative. (2020). White matter hyperintensities and their relationship to cognition: Effects of segmentation algorithm. NeuroImage, 206, 116327.