0965

L2C-FNN: Longitudinal to Cross-sectional Feedforward Neural Network for generalizable AD-dementia progression prediction

Chen Zhang^1,2,3, Lijun An^1,2,3, Naren Wulan^1,2,3, Kim-Ngan Nguyen¹, Csaba Orban^1,2,3, Pansheng Chen^1,2,3, Christopher Chen⁴, Juan Helen Zhou^1,2,5, and B. T. Thomas Yeo^1,2,3,5,6
¹Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore, ²Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore, ³N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore, Singapore, ⁴Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore, ⁵Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore, Singapore, ⁶Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States

Synopsis

Keywords: Diagnosis/Prediction, Multimodal, Generalization; Generalizable; Longitudinal; Disease progression modeling

Motivation: Current longitudinal AD-dementia progression prediction studies lack cross-cohort evaluation, raising concerns about the clinical applicability of prediction models.

Goal(s): Our goal was to develop a generalizable ML algorithm, L2C-FNN, and assess its generalizability across entirely distinct test cohorts.

Approach: L2C-FNN and baseline models were trained solely on ADNI and subsequently evaluated on AIBL, MACC, and OASIS. Multimodal biomarkers were leveraged for forecasting future clinical diagnosis, cognition, and ventricle volume.

Results: Our algorithm compares favorably against strong baseline models across all test datasets, confirming its superior generalizability.

Impact: The demonstrated potential for improved generalizability in L2C-FNN signifies progress toward enhancing AI prediction models for clinical application. This underscores the continued need for cross-cohort evaluation in future AD-dementia progression modeling studies.

Introduction

Alzheimer’s disease dementia (AD-dementia) is a neurodegenerative disorder with a prolonged prodromal phase and limited therapeutic options post-dementia onset, emphasizing the importance of early detection for timely and effective intervention¹. Hence, predicting longitudinal disease progression of individuals is of substantial interest^2–4. However, the absence of cross-cohort assessments in previous studies have raised concerns about clinical applicability⁵ due to the cohort disparities, including differences in scanners and protocols⁶ or the variations in cortical structures among populations⁷. In this study, we introduce the Longitudinal to Cross-sectional Feedforward Neural Network (L2C-FNN), a robust model designed to mitigate cohort differences and demonstrate its superior generalizability against strong machine learning baseline models across three separate unseen cohorts.

Methods

L2C-FNN and baseline models underwent training on the ADNI⁸ dataset (N=2421) followed by evaluation of generalizability on external test cohorts: AIBL⁹ (N=862) from Australia, MACC¹⁰ (N=700) from Singapore, and OASIS¹¹ (N=1378) from North America. ADNI participants were randomly divided into training, validation, and test sets (ratio of 18:1:1) for model fitting, hyperparameter tuning and within-cohort evaluation. The trained models were adapted to AIBL, MACC, and OASIS for cross-cohort evaluation, with 20 repetitions to ensure result stability (Figure 1A). Care was taken to ensure non-overlapping test sets, covering the entirety of the ADNI cohort across the 20 data splits.
Utilizing multimodal inputs (e.g., cognitive state measurements, cortical and/or subcortical ROI volumes, clinical diagnosis) from the first 50% of timepoints of each participant, we predicted clinical diagnosis, ventricular volume, and cognitive state (measured by MMSE) for the second 50% of timepoints, projecting up to 10 years into the future. All continuous variables (e.g., cognitive test scores, ROI volumes) underwent normalization through GaussRank transformation, a special form of quantile normalization¹², with a Gaussian reference distribution. The transformation was estimated on the ADNI training set and applied to normalize the ADNI validation and test sets, as well as the external test cohorts.
L2C-FNN is a deep feedforward neural network (Figure 2) featuring a specialized longitudinal-to-cross-sectional format transformation (Figure 1B). This transformation involves computing summary statistics such as the rate of change, maximum, and minimum of each input modality from historical timeseries data.
Baseline approaches included Frog, an XGBoost-based model, and MinimalRNN³, an RNN-based model, which were 1^st and 2^nd place winners in the TADPOLE international challenge for longitudinal AD-dementia progression prediction^13,14.
For within-cohort evaluation, the final performance of each algorithm was computed by averaging the results across 20 test sets. Although the test sets do not overlap, the subjects used for training do overlap across the test sets. To account for the non-independence, we utilized the corrected resampled t-test¹⁵ to assess performance differences between algorithms. For cross-cohort evaluation, performance was averaged across 20 trained copies of each algorithm. Paired sampled t-test compared performance between each algorithm pair on each test cohort. Multiple comparisons were corrected with a false discovery rate of q < 0.05¹⁶.

Results and discussion

Figure 3 demonstrates the comparable performance of L2C-FNN with strong baseline methods (Frog and MinimalRNN) for within-cohort (ADNI) clinical diagnosis, cognitive state (measured with MMSE), and ventricular volume prediction, Notably, L2C-FNN clinical diagnosis and MMSE prediction outperformed all baseline models numerically.
Figure 4 shows cross-cohort evaluation in three external cohorts (AIBL, MACC, and OASIS), highlighting the superior performance of L2C-FNN over all baseline models, underscoring its robust generalizability. Particularly noteworthy is L2C-FNN's consistent achievement of significantly lower MMSE prediction errors across all test cohorts compared to the baseline methods.
Figure 5 depicts the yearly breakdown of the MMSE prediction performance from Figure 4 up to year 6. As anticipated, the performance of all algorithms deteriorates for predictions further into the future. Nevertheless, L2C-FNN consistently matched or surpassed all baseline methods across all years and test cohorts. This could be attributed to the use of the L2C transformation, avoiding the reliance on recursive techniques like RNN and LSTM commonly used in disease progression modelling, which are sensitive to error accumulation¹⁷. Similar trends were observed for diagnosis and ventricle volume predictions.

Conclusions

In conclusion, our study highlights the superior performance of the L2C-FNN model over baseline algorithms in longitudinal clinical diagnosis, cognitive state, and brain atrophy prediction when trained and tested on the ADNI dataset. Crucially, this strong performance extended to previously unseen cohorts with significantly diverse populations from the training set, including AIBL, MACC, and OASIS, as confirmed by cross-cohort evaluation, emphasizing the model's superior generalizability. Furthermore, L2C-FNN maintained this high level of performance from year 1 to year 6, demonstrating its potential for early detection of AD-dementia.

Acknowledgements

This work was supported by the Singapore National Research Foundation (NRF) Fellowship (Class of 2017), the National University of Singapore Yong Loo Lin School of Medicine (NUHSRO/2020/124/TMR/LOA), the Singapore National Medical Research Council (NMRC) LCG (OFLCG19May-0035), NMRC STaR (STaR20nov-0003), and the United States National Institutes of Health (R01MH120080). Our computational work was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Singapore NRF or the Singapore NMRC. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Data were provided [in part] by OASIS: Longitudinal: Principal Investigators: D. Marcus, R, Buckner, J. Csernansky, J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382, OASIS-3: Principal Investigators: T. Benzinger, D. Marcus, J. Morris; NIH P50AG00561, P30NS09857781, P01AG026276, P01AG003991, R01AG043434, UL1TR000448, R01EB009352. AV-45 doses were provided by Avid Radiopharmaceuticals, a wholly owned subsidiary of Eli Lilly.

References

1. Scheltens P, Blennow K, Breteler MMB, et al. Alzheimer’s disease. The Lancet. 2016;388(10043):505-517. doi:10.1016/S0140-6736(15)01124-1

2. Mehdipour Ghazi M, Nielsen M, Pai A, et al. Training recurrent neural networks robust to incomplete data: Application to Alzheimer’s disease progression modeling. Med Image Anal. 2019;53:39-46. doi:10.1016/j.media.2019.01.004

3. Nguyen M, He T, An L, Alexander DC, Feng J, Yeo BTT. Predicting Alzheimer’s disease progression using deep recurrent neural networks. NeuroImage. 2020;222:117203. doi:10.1016/j.neuroimage.2020.117203

4. Zhou J, Liu J, Narayan VA, Ye J. Modeling disease progression via multi-task learning. NeuroImage. 2013;78:233-248. doi:10.1016/j.neuroimage.2013.03.073

5. Wang C, Li Y, Tsuboshita Y, et al. A high-generalizability machine learning framework for predicting the progression of Alzheimer’s disease using limited data. Npj Digit Med. 2022;5(1):1-10. doi:10.1038/s41746-022-00577-x

6. Dewey BE, Zhao C, Reinhold JC, et al. DeepHarmony: A deep learning approach to contrast harmonization across scanner changes. Magn Reson Imaging. 2019;64:160-170. doi:10.1016/j.mri.2019.05.041

7. Kang DW, Wang SM, Na HR, et al. Differences in cortical structure between cognitively normal East Asian and Caucasian older adults: a surface-based morphometry study. Sci Rep. 2020;10(1):20905. doi:10.1038/s41598-020-77848-8

8. Jack CR, Bernstein MA, Borowski BJ, et al. Update on the magnetic resonance imaging core of the Alzheimer’s disease neuroimaging initiative. Alzheimers Dement J Alzheimers Assoc. 2010;6(3):212-220. doi:10.1016/j.jalz.2010.03.004

9. Ellis KA, Rowe CC, Villemagne VL, et al. Addressing population aging and Alzheimer’s disease through the Australian Imaging Biomarkers and Lifestyle study: Collaboration with the Alzheimer’s Disease Neuroimaging Initiative. Alzheimers Dement. 2010;6(3):291-296. doi:10.1016/j.jalz.2010.03.009

10. Hilal S, Tan CS, van Veluw SJ, et al. Cortical cerebral microinfarcts predict cognitive decline in memory clinic patients. J Cereb Blood Flow Metab Off J Int Soc Cereb Blood Flow Metab. 2020;40(1):44-53. doi:10.1177/0271678X19835565

11. Pamela J. LaMontagne, Tammie LS. Benzinger, John C. Morris, et al. OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer Disease. medRxiv. Published online January 1, 2019:2019.12.13.19014902. doi:10.1101/2019.12.13.19014902

12. Zhao Y, Wong L, Goh WWB. How to do quantile normalization correctly for gene expression data analyses. Sci Rep. 2020;10(1):15534. doi:10.1038/s41598-020-72664-6

13. Marinescu RV, Oxtoby NP, Young AL, et al. TADPOLE Challenge: Prediction of Longitudinal Evolution in Alzheimer’s Disease. Published online August 30, 2018. doi:10.48550/arXiv.1805.03909

14. Marinescu RV, Oxtoby NP, Young AL, et al. The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up. Published online December 27, 2021. doi:10.48550/arXiv.2002.03419

15. Bouckaert RR, Frank E. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. In: Dai H, Srikant R, Zhang C, eds. Vol 3056. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2004:3-12. doi:10.1007/978-3-540-24775-3_3

16. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57(1):289-300. doi:10.1111/j.2517-6161.1995.tb02031.x

17. Fan C, Wang J, Gang W, Li S. Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Appl Energy. 2019;236:700-710. doi:10.1016/j.apenergy.2018.12.004

Figures

Figure 1. Training and testing procedure and L2C-FNN model workflow (A) Models were trained on ADNI and adapted to three unseen test cohorts. ADNI participants were divided into training, validation, and test sets for hyperparameter tuning and within-cohort evaluation. The models were then adapted to test cohorts for cross-cohort evaluation, with 20 repetitions to ensure result stability. (B) Multimodal longitudinal inputs were transformed into a cross-sectional format. A deep feedforward neural network takes in the transformed data and generates multimodal forecasts.

Figure 2. Architecture of the Feedforward Neural Network (FNN) and the range of the hyperparameter search (A) FNN incorporates leaky rectified linear units (LeakyReLU) between layers. The final layer simultaneously outputs all target variables to enable multi-task learning. (B) The model’s structure, such as the number of layers and hidden layer size, and training configurations, including learning rate and weight regularization, serve as hyperparameters estimated from the validation sets.

Figure 3. L2C-FNN compares favorably against baseline methods for within-cohort evaluation. (A) Boxplots represent variability across 20 test sets. (B) Statistical difference between models. “***” indicates p < 0.00001 and statistical significance after multiple comparison correction (FDR q < 0.05). “n.s.” indicates no statistical significance (p ≥ 0.05) or did not survive FDR correction. Green color indicates that L2C-FNN significantly outperforms baseline methods.

Figure 4. L2C-FNN outperforms baselines for cross-cohort evaluation. (A) Boxplots represent variability across 20 models on external cohorts. (B) Statistical difference between models. “*”, “**”, “***” indicates p < 0.05, 0.001, 0.00001 respectively and statistical significance after multiple comparison correction (FDR q < 0.05). “n.s.” indicates no statistical significance (p ≥ 0.05) or did not survive FDR correction. Green color indicates that L2C-FNN significantly outperforms baseline methods.

Figure 5. Breakdown of MMSE prediction performance from Figure 4, detailing yearly intervals up to 6 years into the future. All algorithms exhibited degraded performance with increasing forecast horizon. L2C-FNN demonstrated comparative or superior performance compared to all baseline algorithms across all years and test cohorts.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

0965

DOI: https://doi.org/10.58530/2024/0965