2415

Generative AI for Rapid Diffusion MRI with Improved Image Quality, Reliability and Generalizability

Amir Sadikov^1,2, Xineli Pan³, Hannah Choi², Lanya Cai², and Pratik Mukherjee^1,2
¹Graduate Group in Bioengineering, University of California, San Francisco, San Francisco, CA, United States, ²Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, United States, ³University of California, Berkeley, Berkeley, CA, United States

Synopsis

Keywords: Diffusion Reconstruction, Data Processing

Motivation: Long scan times limit the clinical usage of diffusion MRI (dMRI)

Goal(s): We aim to perform rapid dMRI with high accuracy and reproducibility

Approach: We employ a Swin UNEt Transformers (Swin) model, trained on Human Connectome Project data and conditioned on registered T1 scans, to perform generalized dMRI denoising and super-resolution, requiring only 90 seconds of scan time.

Results: Compared with state-of-the-art self-supervised methods, the fully-supervised Swin UNETR achieved higher accuracy on external out-of-domain (OOD) datasets and exhibited 50% lower coefficient-of-variation for intracellular volume fraction and free water fraction measurements. Fine-tuning on even a single example scan improved performance.

Impact: Our approach achieves unprecedented accuracy and reproducibility in dMRI datasets acquired in different patient populations using different scanner models and pulse sequences and will enable much shorter dMRI scan times for patients unable to cooperate with lengthy imaging protocols.

Introduction

Diffusion MRI (dMRI) can provide valuable clinical information; however, due to its low signal-to-noise ratio (SNR), dMRI requires low angular and spatial resolution or a long scan time, which limits usage¹. Previous supervised methods have limited generalizability to different b-values, diffusion-encoding directions, scanners, and/or patient populations^2–5. Therefore, unsupervised/self-supervised methods are preferred despite their inferior performance within the trained domain of fully-supervised methods.
We employ a Shifted Windows UNet Transformers (Swin) model^6,7, trained on Human Connectome Project (HCP) data⁸ and conditioned on registered T1 scans, to perform dMRI denoising. In addition to evaluating accuracy on out-of-domain (OOD) datasets, we measure test-retest reliability of diffusion tensor imaging (DTI)⁹and neurite orientation dispersion and density imaging (NODDI)¹⁰and their structural covariance networks (SCNs)^11,12. We emphasize removing heavy-tail noise, which can bias metrics. We also compare Swin to a UNet to determine the effect of network architecture, demonstrate super-resolution, and show that fine-tuning, on even one subject, improves performance.

Methods

We used four datasets: HCP⁸: young normal adults with a 1021 subject training dataset and a 44 subject held-out test-retest dataset; TBI^11,13: 45 adult traumatic brain injury patients; SPIN¹⁴: 45 children with neurodevelopmental disorders; and AHA: 8 adolescents with intracerebral hemorrhage due to arteriovenous malformations (AVMs) before and after resection. We compare Swin and UNet with three unsupervised/self-supervised methods: block-matching and 4D filtering (BM4D) ¹⁵, Marchenko-Pastur PCA (MPPCA)¹⁶, and Patch2Self (P2S)¹⁷.
We fit the ground truth using all diffusion-encoding directions and set the subsampled data to six diffusion-encoding directions. We measure the mean absolute error (MAE) between the ground truth and denoised subsampled data for DTI. We assess the coefficient of variation (CoV) across the HCP test-retest sessions for DTI (fully-sampled and subsampled) and NODDI (fully-sampled). For SCN repeatability, we measure the mean absolute difference between the test-retest sessions for DTI (subsampled) and NODDI (fully-sampled). We compute the MAE between the subsampled DTI SCNs and the ground truth DTI SCN.
We train the Swin model via mean-squared error loss between the model output and ground truth (6^th order spherical harmonic projection) with random cropping, rotation, flipping, and k-space downsampling via AdamW optimization (learning rate of 1e-5). Fine-tuning was performed on one held-out subject with a learning rate of 1e-6 for three epochs. Evaluations are done on native dMRI resolution.

Results

Swin achieves the lowest MAE for DTI metrics for all datasets (Table 1) and can transform the heavy-tailed Rician dMRI signal into a more Gaussian distribution (Fig 1B). Swin achieves the lowest CoV for DTI metrics using the subsampled data and fully-sampled data (Table 2). For NODDI repeatability, Swin achieves close to 50% lower CoV than the next best method and has lower regional gray matter (GM) and white matter (WM) CoV (Fig. 1A). Swin generates the most accurate DTI SCNs and has the lowest DTI and NODDI SCN repeatability error.
Swin denoising with only 6 directions approaches the image quality of all 55 directions, resulting in a 9-fold speedup of scan time, even in the lowest quality scan of the AHA dataset (Fig. 2). With 55 directions, Swin removes the noise from the AVM and its hemorrhage. Swin captures fine anatomic details in posterior periventricular WM and avoids excessive blurring in super-resolution (Fig. 3).

Discussion

Swin is the first fully-supervised dMRI denoising method that can be applied to widely varying scanners, patient populations, and acquisition parameters with more accurate DTI on three external OOD datasets and superior test-retest reliability, especially for NODDI, possibly due to a more Gaussian output distribution.
Most protocols require 30 diffusion-encoding directions for DTI, taking about ten minutes¹⁸. With Swin’s five-fold scan time speed up, accurate high-resolution DTI is achievable in 90 seconds for HCP, 100 seconds for TBI or AHA, and only 20 seconds for SPIN datasets, enabling usage in uncooperative populations by mitigating motion artifacts.
The UNet, while outperforming self-supervised methods, trailed Swin, which captures long-range dependencies better¹⁹. Further hyperparameter optimization is needed to determine the optimal neural net architecture. We observed grokking²⁰ during Swin training, which could be due to AdamW optimization, our large dataset, and data augmentation. A better understanding of grokking could be instrumental in designing generative AI models that generalize well at scale.
Fine-tuning on even one subject led to improved dMRI denoising of OOD scans and no significant benefits were found with fine-tuning on more subjects, which is further evidence of the ability of Swin UNet Transformers to generalize rapidly to new data distributions.

Acknowledgements

HCP data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; U54 MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. TBI data were acquired as part of a research project funded by NIH R01NS060886 (Principal Investigator: Pratik Mukherjee). SPIN data were acquired as part of a research project funded by NIH R01 MH116950 (Principal Investigators: Pratik Mukherjee and Elysa J. Marco). AHA data were acquired with funding from the American Heart Association (AHA) Bugher Foundation (Principal Investigators: Heather Fullerton, Christine Fox, Helen Kim, and Pratik Mukherjee).

References

1. Diffusion MRI. vol. 1 (Oxford University Press, 2012).

2. Jurek, J. et al. Supervised denoising of diffusion-weighted magnetic resonance images using a convolutional neural network and transfer learning. Biocybern Biomed Eng 43, 206–232 (2023).

3. Karimi, D. & Gholipour, A. Diffusion Tensor Estimation with Transformer Neural Networks. Artif Intell Med 130, (2022).

4. Tian, Q. et al. SDnDTI: Self-supervised deep learning-based denoising for diffusion tensor MRI. Neuroimage 253, (2022).

5. Tian, Q. et al. DeepDTI: High-fidelity six-direction diffusion tensor imaging using deep learning. Neuroimage 219, 117017 (2020).

6. Hatamizadeh, A. et al. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12962 LNCS, 272–284 (2022).

7. Liu, Z. et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the IEEE International Conference on Computer Vision 9992–10002 (2021) doi:10.48550/arxiv.2103.14030.

8. Van Essen, D. C. et al. The WU-Minn Human Connectome Project: An Overview. Neuroimage 80, 62 (2013).

9. Mukherjee, P., Chung, S. W., Berman, J. I., Hess, C. P. & Henry, R. G. Diffusion Tensor MR Imaging and Fiber Tractography: Technical Considerations. American Journal of Neuroradiology 29, 843–852 (2008).

10. Zhang, H., Schneider, T., Wheeler-Kingshott, C. A. & Alexander, D. C. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage 61, 1000–1016 (2012).

11. Wahl, M. et al. Microstructural correlations of white matter tracts in the human brain. Neuroimage 51, 531–541 (2010).

12. Li, Y.-O. et al. Independent component analysis of DTI reveals multivariate microstructural correlations of white matter in the human brain. Hum Brain Mapp 33, 1431–1451 (2012).

13. Kuceyeski, A. F., Jamison, K. W., Owen, J. P., Raj, A. & Mukherjee, P. Longitudinal increases in structural connectome segregation and functional connectome integration are associated with better recovery after mild TBI. Hum Brain Mapp 40, 4441–4456 (2019).

14. Mark, I. T. et al. Neurite orientation dispersion and density imaging of white matter microstructure in sensory processing dysfunction with versus without comorbid ADHD. Front Neurosci 17, (2023).

15. Maggioni, M., Katkovnik, V., Egiazarian, K. & Foi, A. Nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Transactions on Image Processing 22, 119–133 (2013).

16. Veraart, J. et al. Denoising of diffusion MRI using random matrix theory. Neuroimage 142, 394 (2016).

17. Fadnavis, S., Batson, J. & Garyfallidis, E. Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning. Adv Neural Inf Process Syst 2020-Decem, (2020).

18. Jones, D. K. The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: A Monte Carlo study. Magn Reson Med 51, 807–815 (2004).

19. Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. (2020) doi:10.48550/arxiv.2010.11929.

20. Power, A., Burda, Y., Edwards, H., Babuschkin, I. & Misra, V. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. (2022).

Figures

Figure 1: (A) ICVF and ISOVF CoV (%) in select WM and GM regions and (B) histogram of signal intensity in the b=2000 s/mm² shell in the lateral ventricles of an AHA patient. Results using no denoising (RAW), P2S, BM4D, MPPCA, SWIN, and UNet denoising are displayed. Swin achieves dramatically lower CoV on regional GM and WM measurements. Unlike other denoising algorithms, Swin is able to transform the original data from a heavy-tailed Rician-like distribution into a more Gaussian-like distribution with significantly smaller variance and skew but greater kurtosis.

Figure 2: Effect of denoising on a 6 direction subset and a 55 direction full shell for poor quality data (CNR = 1.2375 measured by FSL Eddy). DTI metrics derived from no denoising (RAW), Patch2Self (P2S), MPPCA, BM4D, UNet without fine-tuning (UNET), Swin without fine-tuning (SWIN), UNet with fine-tuning (UNET-F1), and Swin with fine-tuning (SWIN-F1) are displayed along with the T1. The subject has a left temporal hemorrhage due to an arteriovenous malformation. Swin is better at removing the lesional and perilesional noise and is more consistent with the underlying anatomy.

Figure 3: Visual comparison between the ground truth (GT), no post-processing (RAW), MPPCA, BM4D, UNet, and Swin without fine-tuning (SWIN) for super-resolution in the posterior periventricular WM of an HCP subject. Data was k-space downsampled by a factor of two and then upsampled with 5^th order spline interpolation back to 1.25 mm to emulate a low resolution acquisition.

Table 1: The Mean Absolute Error (MAE) of FA, MD, RD, AD, and V1 estimation using six-direction HCP, SPIN, TBI, and AHA data in white matter (WM) and gray matter (GM) via no denoising (RAW), P2S, MPPCA, BM4D, UNET, SWIN with no fine-tuning (SWIN), UNET with fine-tuning on one subject (UNET-F1), and SWIN with fine-tuning on one subject (SWIN-F1). We include the maximum p-value from subject-wise, paired, one-tailed t-tests between the other models and the SWIN model for the HCP dataset and the SWIN-F1 model for the OOD datasets. Best MAE results and significant p-values (p < 0.05) are bolded.

Table 2: Average CoV (%) across all WM and GM regions for DTI using 6 and 90 direction B1000 shell subsets and NODDI estimation using all acquired data. We include the maximum p-value from subject-wise, paired, one-tailed t-tests between the other models and SWIN for DTI and NODDI CoV. The Mean Absolute Error (MAE) for DTI SCN estimation using 6 direction subsets and the repeatability error for DTI and NODDI SCNs using all acquired data. Results using no denoising (RAW), P2S, BM4D, MPPCA, and SWIN are displayed. Best results and significant p-values (p < 0.05) are bolded.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

2415

DOI: https://doi.org/10.58530/2024/2415