4352

Application-specific structural brain MRI harmonization

Lijun An^1,2,3, Pansheng Chen^1,2,3, Jianzhong Chen^1,2,3, Christopher Chen⁴, Juan Helen Zhou², and B.T. Thomas Yeo^1,2,3,5,6
¹Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore, ²Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), National University of Singapore, Singapore, Singapore, ³N.1 Institute for Health & Institute for Digital Medicine (WisDM), National University of Singapore, Singapore, Singapore, ⁴Department of Pharmacology, National University of Singapore, Singapore, Singapore, ⁵NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore, Singapore, ⁶Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States

Synopsis

We propose a flexible application-specific harmonization framework utilizing downstream application performance to regularize the harmonization procedure. Our approach can be integrated with various deep learning models. Here, we apply our approach to the recently proposed conditional variational autoencoder (cVAE) harmonization model. Three datasets (ADNI, N=1735; AIBL, N=495; MACC, N=557) collected from three different continents were used for evaluation. Our results suggest our approach (AppcVAE) compares favorably with ComBat (named for “combating batch effects when combining batches”) and cVAE for improving downstream application performance.

Purpose

There is significant interest in pooling magnetic resonance image (MRI) data from multiple sites to support large sample size studies (Miller et al., 2016; Thompson et al., 2017; Volkow et al., 2018). MRI harmonization is typically performed to reduce heterogeneity when pooling MRI data from multiple sites. However, most MRI harmonization algorithms (Fortin et al., 2017; Fortin et al., 2018; Yu et al., 2019; Zhao et al., 2019; Garica-Dias et al., 2020; Moyer et al., 2020; Pomponio et al., 2020; Wachinger et al., 2020; Bashyam et al., 2021; Zuo et al., 2021) do not explicitly consider downstream application performance during harmonization, which might potentially limit their performance in downstream applications. We propose a flexible application-specific harmonization framework utilizing downstream application performance to regularize the harmonization procedure. In this study, we focus on downstream applications relevant to Alzheimer’s Disease (AD), specifically prediction of AD diagnosis category and Mini-mental state exam (MMSE) score prediction. We evaluated three AD datasets (ADNI, N=1735; AIBL, N=495; MACC, N=557) collected from three different continents. Our result suggests our approach (AppcVAE) compares favorably with ComBat and cVAE for improving downstream application performance.

Methods

We conducted two sets of experiments in this study. The first set of experiments harmonized data from ADNI (http://adni.loni.usc.edu/) and AIBL (Data used in the preparation of this abstract was obtained from the Australian Imaging Biomarkers and Lifestyle flagship study of ageing (AIBL). See www.aibl.csiro.au for further details.), and the other set of experiments harmonized data from ADNI and MACC (http://macc.sg/). It’s important to note that the MACC is more different than ADNI and AIBL such as dementia level, educational level, and racial groups.The matching step controlled for differences in demographics, disease severity, and cognition impairment. The matched subjects were used as test set. The remaining unmatched subjects were used to train and tune a downstream application model and harmonization approaches. We processed structural MRI images using the FreeSurfer T1 processing pipeline and used 108 volumes of brain regions for future analysis.

An XGBoost model (Chen & Guestrin, 2016) was developed for predicting sites. Higher site prediction accuracy indicates larger site differences. If there are large site differences between unharmonized datasets, the site prediction accuracy should be very high. If the harmonization is perfect, the classification accuracy should be 0.5.
We trained and tuned a deep neural network AppDNN as shown in Figure 1A using unmatched ADNI subjects. The AppDNN model predicts AD diagnosis category and MMSE, which is the downstream application in this study. A performance drop is expected between applying AppDNN on matched unharmonized ADNI and applying AppDNN on matched unharmonized AIBL (MACC). If the harmonization is perfect, there would be no performance drop when applying AppDNN on matched harmonized AIBL (MACC) subjects.

The proposed model structure is shown in Figure 1B. We used the conditional variational autoencoder (cVAE) proposed in Moyer et al., 2020 as our base harmonization model. The data harmonized by cVAE would be the input to the AppDNN model. The AppDNN model would then calculate loss between the ground truth and its prediction. The loss function of the AppcVAE model is as follows:

L_Taskcvae = λ_MMSEMAE + λ_DiagnoaisCrossEntropy

MAE is the mean absolute error between the prediction and true MMSE score, CrossEntropy is cross entropy loss between the prediction between true AD diagnosis category. The loss from the AppDNN model is used to guide the cVAE model to get better harmonization on downstream application performance. The weights of the AppDNN model are fixed and do not change in the training of AppcVAE. On the other hand, the weights of the cVAE model are updated during the finetuning process. In the finetuning process, we only tuned the learning rate, λ_MMSE and λ_Diagnosis . Since cVAE is trained to remove site differences, we chose a small learning rate to finetune cVAE to avoid updating cVAE too much, otherwise we might sacrifice the ability of cVAE to remove differences between sites. Considering there were only three hyper-parameters to tune, we performed a grid search on the three hyper-parameters.

Results

Figure 2 shows the unharmonized data has almost 100% site prediction accuracy for both ADNI-AIBL and ADNI-MACC experiments, which indicates the existence of large site differences. AppcVAE harmonization could remove more site differences than ComBat. AppcVAE removes similar site differences as cVAE, which means the fine-tuning process in AppcVAE training did not harm the original cVAE's site difference removal ability.

In this study, we focus on Alzheimer’s Disease relevant downstream applications. Considering that MACC is more different than ADNI and AIBL, the different performance after harmonization would be expected. Figure 3 and Figure 4 show the AD diagnosis prediction performance and MMSE prediction performance for both ADNI-AIBL and ADNI-MACC experiments. The results on the two sets of experiments show that AppcVAE compares favorably with ComBat and cVAE for improving AD diagnosis prediction accuracy and MMSE prediction performance.

Acknowledgements

Our research is currently supported by the Singapore National Research Foundation (NRF) Fellowship (Class of 2017), the NUS Yong Loo Lin School of Medicine (NUHSRO/2020/124/TMR/LOA), the Singapore National Medical Research Council (NMRC) LCG (OFLCG19May-0035) and the USA NIH (R01MH120080). Our computational work was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the Singapore NRF or the Singapore NMRC. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

References

1. Miller, Karla L., et al. "Multimodal population brain imaging in the UK Biobank prospective epidemiological study." Nature neuroscience 19.11 (2016): 1523-1536.

2. Thompson, Paul M., et al. "ENIGMA and the individual: Predicting factors that affect the brain in 35 countries worldwide." Neuroimage 145 (2017): 389-408.

3. Volkow, Nora D., et al. "The conception of the ABCD study: From substance use to a broad NIH collaboration." Developmental cognitive neuroscience 32 (2018): 4-7.

4. Fortin, Jean-Philippe, et al. "Harmonization of multi-site diffusion tensor imaging data." Neuroimage 161 (2017): 149-170.

5. Fortin, Jean-Philippe, et al. "Harmonization of cortical thickness measurements across scanners and sites." Neuroimage 167 (2018): 104-120.

6. Yu, Meichen, et al. "Statistical harmonization corrects site effects in functional connectivity measurements from multi‐site fMRI data." Human brain mapping 39.11 (2018): 4213-4227.

7. Zhao, Fenqiang, et al. "Harmonization of infant cortical thickness using surface-to-surface cycle-consistent adversarial networks." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2019.

8. Garcia-Dias, Rafael, et al. "Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners." NeuroImage 220 (2020).

9. Moyer, Daniel, et al. "Scanner invariant representations for diffusion MRI harmonization." Magnetic resonance in medicine 84.4 (2020): 2174-2189.

10. Pomponio, Raymond, et al. "Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan." NeuroImage 208 (2020): 116450.

11. Wachinger, Christian, et al. "Detect and correct bias in multi-site neuroimaging datasets." Medical Image Analysis 67 (2021): 101879.

12. Bashyam, Vishnu M., et al. "Deep Generative Medical Image Harmonization for Improving Cross‐Site Generalization in Deep Learning Predictors." Journal of Magnetic Resonance Imaging (2021).

13. Zuo, Lianrui, et al. "Unsupervised MR harmonization by learning disentangled representations using information bottleneck theory." NeuroImage 243 (2021): 118569.

14. Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.

Figures

Figure 1. AppDNN and AppcVAE model structure. Top panel is model structure for AppDNN, it is a fully connected deep neural network; bottom panel is model structure for AppcVAE model, encoder, decoder and discriminator are all fully connected deep neural networks, s is the site we want to map data to, the fixed AppDNN model is utilised to guide harmonization process to improve downstream model performance.

Figure 2. Site prediction accuracy.(A) Site prediction accuracy for ADNI-AIBL;(B) Site prediction accuracy for ADNI-MACC. “n.s.” indicates no statistical significance. “*” indicates had statistical significance and survived FDR correction.

Figure 3. AD diagnosis prediction accuracy. (A) AD diagnosis prediction accuracy for ADNI-AIBL; (B) AD diagnosis prediction accuracy for ADNI-MACC. “n.s.” indicates no statistical significance. “*” indicates had statistical significance and survived FDR correction.

Figure 4. MMSE prediction error. (A) MMSE prediction error for ADNI-AIBL; (B) MMSE prediction error for ADNI-MACC. “n.s.” indicates no statistical significance. “*” indicates had statistical significance and survived FDR correction.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)

4352

DOI: https://doi.org/10.58530/2022/4352