3485

White matter hyperintensity volumes and cognition: Assessment of a deep learning-based lesion detection and quantification algorithm on ADNI

Lavanya Umapathy¹, Gloria Guzman Perez-Carillo², Blair Winegar³, Srinivasan Vedantham⁴, Maria Altbach⁴, and Ali Bilgin^1,4,5
¹Electrical and Computer Engineering, University of Arizona, Tucson, AZ, United States, ²Mallinckrodt Institute of Radiology, St Louis, MO, United States, ³Radiology and Imaging Sciences, University of Utah, Salt Lake, UT, United States, ⁴Medical Imaging, University of Arizona, Tucson, AZ, United States, ⁵Biomedical Engineering, University of Arizona, Tucson, AZ, United States

Synopsis

The relationship between cognition and white matter hyperintensities (WMH) volumes often depends on accuracy of the lesion segmentation algorithm used. As such, accurate detection and quantification of WMH is of great interest. Here, we use a deep learning-based WMH segmentation algorithm, StackGen-Net, to detect and quantify WMH on 3D-FLAIR images from ADNI. We used a subset of subjects (n=20) and obtained manual WMH segmentations by an experienced neuro-radiologist to demonstrate the accuracy of our algorithm. On a larger cohort of subjects (n=290), we observed larger WMH volumes correlated with worse performance on executive function (P=.004), memory (P=.01), and language (P=.005).

Purpose

White matter hyperintensities (WMH) are brain white matter lesions that appear bright in Fluid Attenuated Inversion Recovery (FLAIR) MR images¹. The extent of WMH lesion burden is associated with degeneration of axons and myelin and is of clinical relevance in aging and age-related neurological disorders^2,3. Increased WMH burden has been associated with a decline in cognitive factors such as executive function, memory, and language. Accurate detection and quantification of WMH and studying the temporal correlation between lesion burden and disease progression is of interest in the neuroimaging community.
In this work, we use StackGen-Net^4,5, a fast and automated deep learning-based WMH segmentation algorithm, to detect and quantify WMH on isotropic resolution 3D-FLAIR images. We use 3D-FLAIR images from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) repository to demonstrate the accuracy of our segmentation algorithm compared to WMH volumes made available by ADNI (n=20). We also evaluate the clinical value of our algorithm by assessing the impact of WMH volumes on executive function, memory, and language on a group of ADNI subjects (n=290) diagnosed as cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer’s Disease (AD).

Methods

We recently proposed StackGen-Net, a stacked generalization ensemble of 3D Convolutional Neural Networks, that improved segmentation of WMH in 3D-FLAIR images compared to some state-of-the-art CNN segmentation frameworks^4,5. StackGen-Net (Figure 1) consists of three orthogonal 3D CNNs each trained on axial, sagittal, and coronal reformatting of the 3D-FLAIRs. Stacked generalization maximizes the overall generalization accuracy by deducing the bias rate of individual CNNs in the ensemble. The Meta-CNN learns a new functional mapping from WMH predictions of the orthogonal CNNs to the final WMH prediction. Trained on a cohort of subjects with a history of vascular disease (n=30), StackGen-Net segment WMHs in a time-efficient manner with performance comparable to human inter-observer variability⁵.
Baseline data from 290 subjects imaged with the ADNI-3 protocol (with 1mm isotropic 3D FLAIR volumes) were downloaded from ADNI website (www.adni.loni.usc.edu). This included demographics, diagnosis, education, WMH volume estimates, and composite scores for executive function (ADNI-EF), memory (ADNI-MEM), and language (ADNI-LAN) from the neuropsychological test battery⁶. The WMH lesion volumes from ADNI (ADNI-WMH) were quantified using a histogram-based technique⁷.
The 3D FLAIR volumes from 290 subjects (193 CN, 73 MCI, and 24 AD) were brain-extracted and bias-corrected, followed by WMH detection and volume estimation by StackGen-Net (StackGen-Net-WMH). The total prediction time for a pre-processed 3D-FLAIR volume (256x256x240) was 40 seconds. An experienced neuro-radiologist manually annotated WMH on 3D-FLAIR volumes of 20 non-demented participants selected randomly from this ADNI cohort. These annotations were used to evaluate the segmentation performance of our algorithm using metrics such as Dice score (pixel and lesion), precision (lesion), recall (lesion), average volume difference (%), and area under the precision-recall curve.
We also used Bland-Altman (BA) analysis to compare manual WMH volume estimated with StackGen-Net-WMH and ADNI-WMH. The agreement was evaluated using R², Coefficient of Variation (CV), and repeatability coefficient (RPC) statistics. Two-sided paired t-tests were used to assess if there were any significant WMH volume differences between ground truth annotations and the two segmentation algorithms.
The associations of WMH volumes with ADNI-EF, ADNI-MEM, and ADNI-LAN were explored using multiple-linear regression models after adjusting for age, intercranial volume, sex, education level, APOE4 allele genotype, and diagnosis.

Results

Compared to manual annotations, StackGen-Net achieved average Dice score (pixel), Dice score (lesion), absolute volume difference, and area under precision-recall curve of 0.76 ± 0.09, 0.73 ± 0.11, 13.7% ± 9.7%, and 0.84 ± 0.10, respectively (n=20). Figure 2 shows WMH volume predictions on multi-planar images from a test ADNI 3D-FLAIR volume. We see that the WMH predictions from StackGen-Net agree well with manual annotations by an expert neuro-radiologist.
BA analysis (Figure 3) showed excellent agreement between WMH volumes estimated from ground-truth annotations and StackGen-Net-WMH (R²=0.98, CV=17%, RPC=33%) with a tighter limit of agreement. There were no significant differences between the WMH volumes from StackGen-Net and ground truth (P=0.47, n=20, two-sided paired t-test). In contrast, ADNI-WMH showed a significant difference (P=.01) in WMH volumes (R²=0.91, CV=59%, RPC=100%) when compared to ground truth.
Bigger volumes of WMH (StackGen-Net-WMH) correlated with worse performance on ADNI-EF (P=.004), ADNI-MEM (P=.01), and ADNI-LAN (P=.005). Similar, significant but less pronounced, effects were also observed using ADNI-WMH for ADNI-EF (P=.01), ADNI-MEM (P=.016), and ADNI-LAN (P=.016).

Discussion

Many works using WMH volumes quantification on 2D-FLAIR images from ADNI have explored the relationship between WMH volumes and cognitive functions. The extent of the relationship (or lack thereof) often depends on the accuracy of the segmentation algorithm⁸. In this work, we used a subset of 3D-FLAIR volumes (n=20) from ADNI and obtained manual segmentations of WMHs by an experienced neuro-radiologist to demonstrate the accuracy of our WMH segmentation algorithm. We also demonstrated, using a larger cohort of 3D-FLAIR images (n=290) that larger WMH volumes correlate with significantly worse performance on executive function, memory, and language tasks, thereby affecting cognition. The analyses using WMH volumes from ADNI also agreed with the associations observed in this work.

Conclusion

The use of a stacked generalization of CNN models can provide fast and accurate quantitative evaluation of WMHs to study association between WMH volumes and cognitive decline.

Acknowledgements

Arizona Health Sciences Center Translational Imaging Program Project Stimulus
BIO5 Team Scholar’s Program
Arizona Alzheimer’s Consortium
Alzheimer’s Disease Neuroimaging Initiative

References

1. Wardlaw JM, Pantoni L, Pantoni L, Gorelick PB. Sporadic small vessel disease: pathogenic aspects. In: Cerebral Small Vessel Disease. Cambridge: Cambridge University Press; 2014:52-63. doi:10.1017/CBO9781139382694.007

2. Maniega SM, Valdés Hernández MC, Clayden JD, et al. White matter hyperintensities and normal-appearing white matter integrity in the aging brain. Neurobiol Aging. 2015;36(2):909-918. doi:10.1016/j.neurobiolaging.2014.07.048

3. Brickman AM, Meier IB, Korgaonkar MS, et al. Testing the white matter retrogenesis hypothesis of cognitive aging. Neurobiol Aging. 2012;33(8):1699-1715. doi:10.1016/j.neurobiolaging.2011.06.001

4. Umapathy L, Guzman Perez-Carillo G, Keerthivasan MB, et al. A Stacked Generalization of 3D Orthogonal Deep Learning Convolutional Neural Networks for Improved Detection of White Matter Hyperintensities in 3D FLAIR Images. in Press. AJ Neurol. 2021.

5. Umapathy L, Guzman Perez-Carillo G, Keerthivasan MB, et al. StackGen-Net: A Stacked Generalization of 3D Orthogonal Convolutional Neural Networks for Improved Detection of White Matter Hyperintensities. proc. ISMRM 2020

6. Gibbons LE, Carle AC, Mackin RS, et al. A composite score for executive functioning, validated in Alzheimer's Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment. Brain Imaging Behav. 2012;6(4):517-527. doi:10.1007/s11682-012-9176-1

7. DeCarli, C., Murphy, D. G., Tranh, M., Grady, C. L., Haxby, J. V, Gillette, J. a, … Rapoport, S. I. The effect of white matter hyperintensity volume on brain structure, cognitive performance, and cerebral metabolism of glucose in 51 healthy adults. Neurology. 1995. 45 (11), 2077–84.

8. Tubi, M. A., Feingold, F. W., Kothapalli, D., Hare, E. T., King, K. S., Thompson, P. M., ... & Alzheimer’s Disease Neuroimaging Initiative. (2020). White matter hyperintensities and their relationship to cognition: Effects of segmentation algorithm. NeuroImage, 206, 116327.

Figures

Figure1: Framework for StackGen-Net, a stacked generalization ensemble of Orthogonal Convolutional Neural Networks (CNNs). Each CNN predicted white matter hyperintensity (WMH) posterior probabilities on multi-planar orientations (axial, sagittal, and coronal) of a 3D-FLAIR volume. The final WMH posterior is obtained using a Meta CNN that learns to combine posteriors from individual CNNs in the ensemble. The total prediction time for a brain-extracted and bias-corrected 3D-FLAIR volume (256x256x240) was 40 seconds.

Figure2: Axial and coronal cross-sections of a test 3D-FLAIR volume from ADNI. The manual annotations are overlaid in the middle row for reference. The predictions from StackGen-Net agree well with manual annotations. Table 1 shows mean and standard deviation values for the different evaluation metrics on the ADNI test subset (n=20 volumes).

Figure3: Bland-Altman analysis to show agreement between WMH volumes from manual annotations and two different segmentation algorithms. Left: The WMH volumes downloaded from ADNI (ADNI-WMH) were estimated using an intensity histogram-based technique. Right: The WMH volumes were estimated by our deep learning-based framework (StackGen-Net-WMH). Notice that the limits of agreement are tighter for StackGen-Net-WMH with smaller coefficient of variation (CV).

Figure4: ADNI cohort demographics (Table 2) and results of multiple linear regression models for association between cognition and WMH volumes (Table 3) are shown.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

3485