Amir Sadikov1,2, Xineli Pan3, Hannah Choi2, Lanya Cai2, and Pratik Mukherjee1,2
1Graduate Group in Bioengineering, University of California, San Francisco, San Francisco, CA, United States, 2Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States, 3University of California, Berkeley, Berkeley, CA, United States
Synopsis
Keywords: Diffusion Reconstruction, Data Processing
Motivation: Long scan times limit the clinical usage of diffusion MRI (dMRI)
Goal(s): We aim to perform rapid dMRI with high accuracy and reproducibility
Approach: We employ a Swin UNEt Transformers (Swin) model, trained on Human Connectome Project data and conditioned on registered T1 scans, to perform generalized dMRI denoising and super-resolution, requiring only 90 seconds of scan time.
Results: Compared with state-of-the-art self-supervised methods, the fully-supervised Swin UNETR achieved higher accuracy on external out-of-domain (OOD) datasets and exhibited 50% lower coefficient-of-variation for intracellular volume fraction and free water fraction measurements. Fine-tuning on even a single example scan improved performance.
Impact: Our approach achieves unprecedented accuracy and reproducibility in dMRI datasets acquired in different patient populations using different scanner models and pulse sequences and will enable much shorter dMRI scan times for patients unable to cooperate with lengthy imaging protocols.
Introduction
Diffusion MRI (dMRI) can provide valuable clinical information; however, due to its low signal-to-noise ratio (SNR), dMRI requires low angular and spatial resolution or a long scan time, which limits usage1. Previous supervised methods have limited generalizability to different b-values, diffusion-encoding directions, scanners, and/or patient populations2–5. Therefore, unsupervised/self-supervised methods are preferred despite their inferior performance within the trained domain of fully-supervised methods.
We employ a Shifted Windows UNet Transformers (Swin) model6,7, trained on Human Connectome Project (HCP) data8 and conditioned on registered T1 scans, to perform dMRI denoising. In addition to evaluating accuracy on out-of-domain (OOD) datasets, we measure test-retest reliability of diffusion tensor imaging (DTI)9 and neurite orientation dispersion and density imaging (NODDI)10 and their structural covariance networks (SCNs)11,12. We emphasize removing heavy-tail noise, which can bias metrics. We also compare Swin to a UNet to determine the effect of network architecture, demonstrate super-resolution, and show that fine-tuning, on even one subject, improves performance.Methods
We used four datasets: HCP8: young normal adults with a 1021 subject training dataset and a 44 subject held-out test-retest dataset; TBI11,13: 45 adult traumatic brain injury patients; SPIN14: 45 children with neurodevelopmental disorders; and AHA: 8 adolescents with intracerebral hemorrhage due to arteriovenous malformations (AVMs) before and after resection. We compare Swin and UNet with three unsupervised/self-supervised methods: block-matching and 4D filtering (BM4D) 15, Marchenko-Pastur PCA (MPPCA)16, and Patch2Self (P2S)17.
We fit the ground truth using all diffusion-encoding directions and set the subsampled data to six diffusion-encoding directions. We measure the mean absolute error (MAE) between the ground truth and denoised subsampled data for DTI. We assess the coefficient of variation (CoV) across the HCP test-retest sessions for DTI (fully-sampled and subsampled) and NODDI (fully-sampled). For SCN repeatability, we measure the mean absolute difference between the test-retest sessions for DTI (subsampled) and NODDI (fully-sampled). We compute the MAE between the subsampled DTI SCNs and the ground truth DTI SCN.
We train the Swin model via mean-squared error loss between the model output and ground truth (6th order spherical harmonic projection) with random cropping, rotation, flipping, and k-space downsampling via AdamW optimization (learning rate of 1e-5). Fine-tuning was performed on one held-out subject with a learning rate of 1e-6 for three epochs. Evaluations are done on native dMRI resolution.Results
Swin achieves the lowest MAE for DTI metrics for all datasets (Table 1) and can transform the heavy-tailed Rician dMRI signal into a more Gaussian distribution (Fig 1B). Swin achieves the lowest CoV for DTI metrics using the subsampled data and fully-sampled data (Table 2). For NODDI repeatability, Swin achieves close to 50% lower CoV than the next best method and has lower regional gray matter (GM) and white matter (WM) CoV (Fig. 1A). Swin generates the most accurate DTI SCNs and has the lowest DTI and NODDI SCN repeatability error.
Swin denoising with only 6 directions approaches the image quality of all 55 directions, resulting in a 9-fold speedup of scan time, even in the lowest quality scan of the AHA dataset (Fig. 2). With 55 directions, Swin removes the noise from the AVM and its hemorrhage. Swin captures fine anatomic details in posterior periventricular WM and avoids excessive blurring in super-resolution (Fig. 3).Discussion
Swin is the first fully-supervised dMRI denoising method that can be applied to widely varying scanners, patient populations, and acquisition parameters with more accurate DTI on three external OOD datasets and superior test-retest reliability, especially for NODDI, possibly due to a more Gaussian output distribution.
Most protocols require 30 diffusion-encoding directions for DTI, taking about ten minutes18. With Swin’s five-fold scan time speed up, accurate high-resolution DTI is achievable in 90 seconds for HCP, 100 seconds for TBI or AHA, and only 20 seconds for SPIN datasets, enabling usage in uncooperative populations by mitigating motion artifacts.
The UNet, while outperforming self-supervised methods, trailed Swin, which captures long-range dependencies better19. Further hyperparameter optimization is needed to determine the optimal neural net architecture. We observed grokking20 during Swin training, which could be due to AdamW optimization, our large dataset, and data augmentation. A better understanding of grokking could be instrumental in designing generative AI models that generalize well at scale.
Fine-tuning on even one subject led to improved dMRI denoising of OOD scans and no significant benefits were found with fine-tuning on more subjects, which is further evidence of the ability of Swin UNet Transformers to generalize rapidly to new data distributions.Acknowledgements
HCP data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; U54 MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. TBI data were acquired as part of a research project funded by NIH R01NS060886 (Principal Investigator: Pratik Mukherjee). SPIN data were acquired as part of a research project funded by NIH R01 MH116950 (Principal Investigators: Pratik Mukherjee and Elysa J. Marco). AHA data were acquired with funding from the American Heart Association (AHA) Bugher Foundation (Principal Investigators: Heather Fullerton, Christine Fox, Helen Kim, and Pratik Mukherjee).References
1. Diffusion MRI. vol. 1 (Oxford University Press, 2012).
2. Jurek, J. et al. Supervised denoising of diffusion-weighted magnetic resonance images using a convolutional neural network and transfer learning. Biocybern Biomed Eng 43, 206–232 (2023).
3. Karimi, D. & Gholipour, A. Diffusion Tensor Estimation with Transformer Neural Networks. Artif Intell Med 130, (2022).
4. Tian, Q. et al. SDnDTI: Self-supervised deep learning-based denoising for diffusion tensor MRI. Neuroimage 253, (2022).
5. Tian, Q. et al. DeepDTI: High-fidelity six-direction diffusion tensor imaging using deep learning. Neuroimage 219, 117017 (2020).
6. Hatamizadeh, A. et al. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12962 LNCS, 272–284 (2022).
7. Liu, Z. et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the IEEE International Conference on Computer Vision 9992–10002 (2021) doi:10.48550/arxiv.2103.14030.
8. Van Essen, D. C. et al. The WU-Minn Human Connectome Project: An Overview. Neuroimage 80, 62 (2013).
9. Mukherjee, P., Chung, S. W., Berman, J. I., Hess, C. P. & Henry, R. G. Diffusion Tensor MR Imaging and Fiber Tractography: Technical Considerations. American Journal of Neuroradiology 29, 843–852 (2008).
10. Zhang, H., Schneider, T., Wheeler-Kingshott, C. A. & Alexander, D. C. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage 61, 1000–1016 (2012).
11. Wahl, M. et al. Microstructural correlations of white matter tracts in the human brain. Neuroimage 51, 531–541 (2010).
12. Li, Y.-O. et al. Independent component analysis of DTI reveals multivariate microstructural correlations of white matter in the human brain. Hum Brain Mapp 33, 1431–1451 (2012).
13. Kuceyeski, A. F., Jamison, K. W., Owen, J. P., Raj, A. & Mukherjee, P. Longitudinal increases in structural connectome segregation and functional connectome integration are associated with better recovery after mild TBI. Hum Brain Mapp 40, 4441–4456 (2019).
14. Mark, I. T. et al. Neurite orientation dispersion and density imaging of white matter microstructure in sensory processing dysfunction with versus without comorbid ADHD. Front Neurosci 17, (2023).
15. Maggioni, M., Katkovnik, V., Egiazarian, K. & Foi, A. Nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Transactions on Image Processing 22, 119–133 (2013).
16. Veraart, J. et al. Denoising of diffusion MRI using random matrix theory. Neuroimage 142, 394 (2016).
17. Fadnavis, S., Batson, J. & Garyfallidis, E. Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning. Adv Neural Inf Process Syst 2020-Decem, (2020).
18. Jones, D. K. The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: A Monte Carlo study. Magn Reson Med 51, 807–815 (2004).
19. Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. (2020) doi:10.48550/arxiv.2010.11929.
20. Power, A., Burda, Y., Edwards, H., Babuschkin, I. & Misra, V. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. (2022).