Lijun An1,2,3, Pansheng Chen1,2,3, Jianzhong Chen1,2,3, Christopher Chen4, Juan Helen Zhou2, and B.T. Thomas Yeo1,2,3,5,6
1Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore, 2Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), National University of Singapore, Singapore, Singapore, 3N.1 Institute for Health & Institute for Digital Medicine (WisDM), National University of Singapore, Singapore, Singapore, 4Department of Pharmacology, National University of Singapore, Singapore, Singapore, 5NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore, Singapore, 6Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States
Synopsis
We
propose a flexible application-specific harmonization framework utilizing
downstream application performance to regularize the harmonization procedure.
Our approach can be integrated with various deep learning models. Here, we
apply our approach to the recently proposed conditional variational autoencoder
(cVAE) harmonization model. Three
datasets (ADNI, N=1735; AIBL, N=495; MACC, N=557) collected from three
different continents were used for evaluation. Our results suggest our approach (AppcVAE)
compares favorably with ComBat (named for “combating batch effects when
combining batches”) and cVAE for improving downstream application performance.
Purpose
There
is significant interest in pooling magnetic resonance image (MRI) data from
multiple sites to support large sample size studies (Miller et al., 2016;
Thompson et al., 2017; Volkow et al., 2018). MRI harmonization is typically
performed to reduce heterogeneity when pooling MRI data from multiple sites.
However, most MRI harmonization algorithms (Fortin et al., 2017; Fortin et al.,
2018; Yu et al., 2019; Zhao et al., 2019; Garica-Dias et
al., 2020; Moyer et al., 2020; Pomponio et al., 2020; Wachinger et al.,
2020; Bashyam et al., 2021; Zuo et al., 2021) do not explicitly consider
downstream application performance during harmonization, which might
potentially limit their performance in downstream applications. We propose a
flexible application-specific harmonization framework utilizing downstream application
performance to regularize the harmonization procedure. In this study, we focus on downstream applications relevant
to Alzheimer’s Disease (AD), specifically prediction of AD diagnosis category
and Mini-mental state exam (MMSE) score prediction. We evaluated three AD datasets
(ADNI, N=1735; AIBL, N=495; MACC, N=557) collected from three different continents.
Our result suggests our approach (AppcVAE) compares favorably with ComBat and
cVAE for improving downstream application performance. Methods
We conducted two sets of experiments in this study. The first set of experiments harmonized data from ADNI (http://adni.loni.usc.edu/) and AIBL (Data used in the preparation
of this abstract was obtained from the Australian Imaging Biomarkers and Lifestyle
flagship study of ageing (AIBL). See www.aibl.csiro.au for further details.), and the other set of experiments harmonized data from ADNI and MACC (http://macc.sg/). It’s important to note that the MACC is more different than ADNI and AIBL such as dementia level, educational level, and racial groups.The matching step controlled for differences in demographics, disease severity, and cognition impairment. The matched subjects were used as test set. The remaining unmatched subjects were used to train and tune a downstream application model and harmonization approaches. We processed structural MRI images using the FreeSurfer T1 processing pipeline and used 108 volumes of brain regions for future analysis.
An XGBoost model (Chen & Guestrin, 2016) was developed for predicting sites. Higher site prediction accuracy indicates larger site differences. If there are large site differences between unharmonized datasets, the site prediction accuracy should be very high. If the harmonization is perfect, the classification accuracy should be 0.5.
We trained and tuned a deep neural network AppDNN as shown in Figure 1A using unmatched ADNI subjects. The AppDNN model predicts AD diagnosis category and MMSE, which is the downstream application in this study. A performance drop is expected between applying AppDNN on matched unharmonized ADNI and applying AppDNN on matched unharmonized AIBL (MACC). If the harmonization is perfect, there would be no performance drop when applying AppDNN on matched harmonized AIBL (MACC) subjects.
The proposed model structure is shown in Figure 1B. We used the conditional variational autoencoder (cVAE) proposed in Moyer et al., 2020 as our base harmonization model. The data harmonized by cVAE would be the input to the AppDNN model. The AppDNN model would then calculate loss between the ground truth and its prediction. The loss function of the AppcVAE model is as follows:
LTaskcvae = λMMSEMAE + λDiagnoaisCrossEntropy
MAE is the mean absolute error between the prediction and true MMSE score, CrossEntropy is cross entropy loss between the prediction between true AD diagnosis category. The loss from the AppDNN model is used to guide the cVAE model to get better harmonization on downstream application performance. The weights of the AppDNN model are fixed and do not change in the training of AppcVAE. On the other hand, the weights of the cVAE model are updated during the finetuning process. In the finetuning process, we only tuned the learning rate, λMMSE and λDiagnosis . Since cVAE is trained to remove site differences, we chose a small learning rate to finetune cVAE to avoid updating cVAE too much, otherwise we might sacrifice the ability of cVAE to remove differences between sites. Considering there were only three hyper-parameters to tune, we performed a grid search on the three hyper-parameters.Results
Figure 2 shows the unharmonized data has almost 100% site prediction accuracy for both ADNI-AIBL and ADNI-MACC experiments, which indicates the existence of large site differences. AppcVAE harmonization could remove more site differences than ComBat. AppcVAE removes similar site differences as cVAE, which means the fine-tuning process in AppcVAE training did not harm the original cVAE's site difference removal ability.
In this study, we focus on Alzheimer’s Disease relevant downstream applications. Considering that MACC is more different than ADNI and AIBL, the different performance after harmonization would be expected. Figure 3 and Figure 4 show the AD diagnosis prediction performance and MMSE prediction performance for both ADNI-AIBL and ADNI-MACC experiments. The results on the two sets of experiments show that AppcVAE compares favorably with ComBat and cVAE for improving AD diagnosis prediction accuracy and MMSE prediction performance.Acknowledgements
Our research is currently supported by the Singapore National Research
Foundation (NRF) Fellowship (Class of 2017), the NUS Yong Loo Lin School of
Medicine (NUHSRO/2020/124/TMR/LOA), the Singapore National Medical Research
Council (NMRC) LCG (OFLCG19May-0035) and the USA NIH (R01MH120080). Our
computational work was partially performed on resources of the National
Supercomputing Centre, Singapore (https://www.nscc.sg). Any opinions, findings and
conclusions or recommendations expressed in this material are those of the
authors and do not reflect the views of the Singapore NRF or the Singapore
NMRC. Data collection and sharing for this
project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI)
(National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of
Defense award number W81XWH-12-2-0012). ADNI is funded by the National
Institute on Aging, the National Institute of Biomedical Imaging and
Bioengineering, and through generous contributions from the following: AbbVie,
Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon
Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir,
Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company;
EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.;
Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research
& Development, LLC.; Johnson & Johnson Pharmaceutical Research &
Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale
Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis
Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda
Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of
Health Research is providing funds to support ADNI clinical sites in Canada.
Private sector contributions are facilitated by the Foundation for the National
Institutes of Health (www.fnih.org). The grantee organization is the
Northern California Institute for Research and Education, and the study is
coordinated by the Alzheimer's Therapeutic Research Institute at the University
of Southern California. ADNI data are disseminated by the Laboratory for Neuro
Imaging at the University of Southern California.References
1. Miller, Karla L., et al. "Multimodal population brain imaging in the UK Biobank prospective epidemiological study." Nature neuroscience 19.11 (2016): 1523-1536.
2. Thompson, Paul M., et al. "ENIGMA and the individual: Predicting factors that affect the brain in 35 countries worldwide." Neuroimage 145 (2017): 389-408.
3. Volkow, Nora D., et al. "The conception of the ABCD study: From substance use to a broad NIH collaboration." Developmental cognitive neuroscience 32 (2018): 4-7.
4. Fortin, Jean-Philippe, et al. "Harmonization of multi-site diffusion tensor imaging data." Neuroimage 161 (2017): 149-170.
5. Fortin, Jean-Philippe, et al. "Harmonization of cortical thickness measurements across scanners and sites." Neuroimage 167 (2018): 104-120.
6. Yu, Meichen, et al. "Statistical harmonization corrects site effects in functional connectivity measurements from multi‐site fMRI data." Human brain mapping 39.11 (2018): 4213-4227.
7. Zhao, Fenqiang, et al. "Harmonization of infant cortical thickness using surface-to-surface cycle-consistent adversarial networks." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2019.
8. Garcia-Dias, Rafael, et al. "Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners." NeuroImage 220 (2020).
9. Moyer, Daniel, et al. "Scanner invariant representations for diffusion MRI harmonization." Magnetic resonance in medicine 84.4 (2020): 2174-2189.
10. Pomponio, Raymond, et al. "Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan." NeuroImage 208 (2020): 116450.
11. Wachinger, Christian, et al. "Detect and correct bias in multi-site neuroimaging datasets." Medical Image Analysis 67 (2021): 101879.
12. Bashyam, Vishnu M., et al. "Deep Generative Medical Image Harmonization for Improving Cross‐Site Generalization in Deep Learning Predictors." Journal of Magnetic Resonance Imaging (2021).
13. Zuo, Lianrui, et al. "Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory." NeuroImage 243 (2021): 118569.
14. Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.