0806

Diffusion Modeling with Unrolled Transformers for Self-Supervised MRI Reconstruction

Yilmaz Korkmaz^1,2,3, Vishal M. Patel¹, and Tolga Cukur^2,3
¹Dept. of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States, ²Dept. of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey, ³National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey

Synopsis

Keywords: AI/ML Image Reconstruction, Machine Learning/Artificial Intelligence, Image reconstruction, diffusion models, deep learning

Motivation: Diffusion models can reconstruct high-quality MR images, but their training neglects physical constraints and requires supervision via ground-truth images derived from fully-sampled acquisitions.

Goal(s): Our goal was to devise a diffusion-based method that incorporates physical constraints and that can be trained using undersampled acquisitions.

Approach: We introduced a novel diffusion model (SSDiffRecon) based on a physics-driven unrolled transformer architecture; and self-supervised training was achieved by predicting held-out subsets of acquired k-space data from remaining subsets.

Results: SSDiffRecon achieved superior reconstructions to alternative self-supervised methods, and performed on par with a supervised benchmark trained on fully-sampled acquisitions.

Impact: The improvement in image quality and acquisition speed through SSDiffRecon, combined with the ability to train on undersampled acquisitions, may facilitate adoption of AI-based reconstruction for comprehensive MRI exams in many applications, particularly in pediatric and elderly populations.

Introduction

In recent years, powerful deep-learning models have been proposed for accelerated MRI reconstruction with performance leaps over traditional methods^1-16. Diffusion models have produced particularly promising results by employing a task-agnostic prior that transforms Gaussian noise onto images over multiple steps^17-22. Yet, common diffusion models are trained by neglecting the physical constraints for accelerated MRI (i.e., sampling patterns, coil sensitivities), so reconstruction requires injection of data-consistency (DC) projections in between sampling steps²¹. Furthermore, previous diffusion methods require supervised training on high-quality reference images derived from either a linear reconstruction of fully-sampled acquisitions or a separate nonlinear reconstruction of undersampled acquisitions^22-24. These limitations hamper the performance and practical utility of diffusion-based MRI reconstruction.

To address the above-mentioned limitations, here we propose a novel physics-driven diffusion model, SSDiffRecon, for self-supervised MRI reconstruction. SSDiffRecon leverages an unrolled architecture that interleaves cross-attention transformer blocks for denoising with data-fidelity blocks for integrating physical constraints. For self-supervised learning on undersampled acquisitions, SSDiffRecon is trained to predict randomly held-out subsets of k-space samples within undersampled data from remaining subsets²³. These advances enable SSDiffRecon to outperform previous self-supervised baselines in image quality, while attaining on par performance with its supervised benchmark trained on fully-sampled data.

Methods

SSDiffRecon is a self-supervised diffusion model for MRI reconstruction trained on a set of data acquired at undersampling factor R. Reference images are taken as the zero-filled Fourier reconstruction $$${x}^u=\mathcal{C}^*\mathcal{F}^{-1}\{y\}$$$ where $\mathcal{C}^*$ denotes adjoint of coil sensitivities $\mathcal{C}$, $\mathcal{F}^{-1}$ denotes inverse Fourier transformation, and $y$ are undersampled multi-coil k-space data.

Diffusion process: In the forward direction, the diffusion process adds noise onto reference images to draw noisy samples according to the forward transition probability: $$\quad q\left({x}^u_t\mid{x}^u_{t-1}\right)=CN\left({x}^u_t;\sqrt{1-\beta_t}{x}^u_{t-1},\beta_t{I}\right)$$where $$$CN(\cdot;\mu,\Sigma)$$$ denotes a complex Gaussian distribution with mean vector $\mu$ and covariance matrix $\Sigma$, $$$t\in[0,T]$$$ denotes the time step with $$$T$$$=1000, $$$\beta_t$$$ determines the noise scale, $$$I$$$ is an identity matrix. In the reverse direction, a denoising network $$$D_{\theta}$$$ parametrizes the gradual mapping from random noise samples onto reference images (Fig.1). In SSDiffRecon, this network is used to estimate the `clean’ sample, i.e., $$${\hat{x}}^u_0 =D_{\theta}({x}^u_t,t,\mathcal{M},\mathcal{C},y)$$$, given the sample at step $$$t$$$, along with the physical constraints: sampling pattern $$$\mathcal{M}$$$, coil sensitivities $$$\mathcal{C}$$$, and acquired data $$$y$$$. The denoised sample at $$$t-1$$$ can then be drawn from the reverse transition probability²⁰:$$p(x^u_{t-1}|x^u_{t})=CN \left(x^u_{t-1};\frac{\sqrt{\overline{\alpha}_{t-1}}\beta_{t-1}}{1-\overline{\alpha}_{t}}\hat{x}^u_{0}+\frac{\sqrt{\alpha_{t}}\left(1-\overline{\alpha}_{t-1}\right)}{1-\overline{\alpha}_{t}}x_{t},\frac{1-\overline{\alpha}_{t-1}}{1-\overline{\alpha}_{t}}\beta_{t}\right)$$where $$$\alpha_{t}:=(1-\beta_{t})$$$ and $$$\overline{\alpha}_{t}:=\prod_{\substack{r=[0,..,t]}}\alpha_{\tau}$$$.

Network architecture: $$$D_{\theta}$$$ uses a novel unrolled architecture cascading cross-attention transformer blocks²⁵ with data-fidelity blocks to enforce physical constraints (Fig.1). An MLP maps $$$t$$$ onto a time embedding²¹, and transformer blocks perform time-dependent filtering of $$${x}^u_t$$$. Data-fidelity blocks enforce strict consistency to $$$y$$$ given $$$\mathcal{M}$$$ and $$$\mathcal{C}$$$ estimated via ESPIRiT²⁶. A 6-cascade architecture is used.

Self-supervised training: Undersampled data $$$y$$$ acquired with $$$\mathcal{M}$$$ are split into two non-overlapping subsets, $$$y_p$$$ with $$$\mathcal{M}_p$$$ used to compute forward passes through the model and $$$y_r$$$ with $$$\mathcal{M}_r$$$ used to evaluate the loss function²³: $$L_{SSDiffRecon}=\mathbb{E}_{t}[||y_r-\mathcal{M}_r\mathcal{F} \{\mathcal{C}\hat{x}^u_0\}||_1],\\{\hat{x}^u}_0=D_{\theta}({x}^{u,p}_t,t,\mathcal{M_p},\mathcal{C},y_p)$$where $$${x}^{u,p}_t=\mathcal{C}^*\mathcal{F}^{-1}\{\mathcal{M}_p\mathcal{F}\{\mathcal{C}x^{u}_t\}\}$$$ (Fig.2).

Reconstruction: Due to its unrolled architecture, SSDiffRecon does not need additional injection of data-consistency projections. To further speed up reconstruction, we initiated sampling with a zero-filled reconstruction of the undersampled acquisition, $$$x_{ts}=\mathcal{C}^*\mathcal{F}^{-1}\{y\}$$$. $$$ts$$$=5 reverse steps were observed to approach convergence. In each step, sampling via $p(x_{t-1}|x_{t})$ was performed after a network forward-pass to estimate $$${\hat{x}}_0=D_{\theta}({x}_{t},t,\mathcal{M},\mathcal{C},\sqrt{\bar{\alpha}_t}y+\sqrt{1-\bar{\alpha}_t}\epsilon)$$$, where random noise $\epsilon\sim CN(\epsilon,0,0.1I)$ was added at descending schedule for gradual enforcement of data fidelity.

Analyses: Demonstrations were performed on single-coil T₁, T₂, PD data from IXI (https://brain-development.org/ixi-dataset/) and multi-coil T₁, T₂, FLAIR data from fastMRI²⁷. (100,10,20) subjects were reserved for (training,validation,testing). 2D variable-density undersampling¹ was performed at R=4. Data undersampled at R were further split into (90%,10%) subsets with ($\mathcal{M}_p,\mathcal{M}_r$) for self-supervision. Training was performed with Adam optimizer, 0.002 learning rate, 100 epochs.

Results

Fig.3 lists performance metrics for SSDiffRecon and several baselines (self-DDPM²¹, self-D5C5³, self-rGAN¹⁵) trained under the same self-supervision approach²³. On average, SSDiffRecon outperforms competing methods by 3.49dB PSNR, 3.75% SSIM across reconstruction tasks. Improvements in spatial acuity, lower artifacts/noise with SSDiffRecon are visually manifested in representative reconstructions in Fig.4. Ablation studies in Fig.5 demonstrate the importance of the unrolled architecture, data-fidelity blocks, transformer blocks, and show that SSDiffRecon performs on par with the supervised benchmark obtained by training the same architecture on fully-sampled acquisitions.

Discussion

Here we introduced a novel self-supervised MRI reconstruction method, SSDiffRecon, based on a diffusion model comprising an unrolled transformer architecture trained to predict masked-out k-space data. The proposed method integrates physical constraints to improve performance and efficiency by avoiding the need for injection of DC projections during inference, while enabling training on undersampled data. Therefore, SSDiffRecon holds great promise for improving the utility of accelerated MRI reconstruction.

Acknowledgements

This work was supported in part by a TUBITAK 1001 Grant No. 121E488, and in part by NIH R01 Grant CA276221.

References

1. Lustig, M., Donoho, D., Pauly, J.M., Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195 (2007).

2. Haldar, J.P., Hernando, D., Liang, Z.P., Compressed-sensing MRI with random encoding. IEEE Transactions on Medical Imaging 30(4), 893–903 (2010).

3. Qin, C., Schlemper, J., Caballero, J., Price, A.N., Hajnal, J.V., Rueckert, D., Convolutional recurrent neural networks for dynamic MR image reconstruction. IEEE Transactions on Medical Imaging 38(1), 280–290 (2018).

4. Wang, S., Su, Z., Ying, L., Peng, X., Zhu, S., Liang, F., Feng, D., Liang, D., Accelerating magnetic resonance imaging via deep learning. In IEEE 13th International Symposium on Biomedical Imaging (ISBI). pp. 514–517 (2016).

5. Hammernik H., Klatzer T., Kobler R., Recht M.P., Sodickson D.K., Pock T., Knoll F., Learning a variational network for reconstruction of accelerated MRI data. Magnetic Resonance in Medicine, vol. 79, no. 6, pp. 3055–3071 (2018).

6. Mardani, M., Gong, E., Cheng, J.Y., Vasanawala, S., Zaharchuk, G., Xing, L., Pauly, J.M., Deep generative adversarial neural networks for compressive sensing MRI. IEEE Transactions on Medical Imaging 38(1), 167–179 (2019).

7. Zhu, B., Liu, J.Z., Rosen, B.R., Rosen, M.S., Image reconstruction by domain transform manifold learning. Nature 555(7697), 487–492 (2018).

8. Akçakaya, M, Moeller, S, Weingärtner, S, Uğurbil, K., Scan-specific robust artificial-neural-networks for k-space interpolation (RAKI) reconstruction Database-free deep learning for fast imaging. Magnetic Resonance in Medicine 81, 439–453, (2019).

9. Aggarwal, H.K., Mani, M.P., Jacob, M., MoDL: Model-Based deep learning architecture for inverse problems. IEEE Transactions on Medical Imaging 38(2), 394–405 (2019).

10. Küstner, T., Fuin, N., Hammernik, K., Bustin, A., Qi, H., Hajhosseiny, R., Masci, P. G., Neji, R., Rueckert, D., Botnar, R. M., Prieto, C., CINENet: deep learning-based 3D cardiac CINE MRI reconstruction with multi-coil complex-valued 4D spatio-temporal convolutions. Scientific Reports, 10(1) (2020).

11. Peng, X., Sutton, B.P., Lam, F., Liang, Z.P., DeepSENSE: Learning coil sensitivity functions for SENSE reconstruction using deep learning. Magnetic Resonance in Medicine, 87(4), 1894–1902 (2020).

12. Polak, D., Cauley, S., Bilgic, B., Gong, E., Bachert, P., Adalsteinsson, E., Setsompop, K., Joint multi-contrast variational network reconstruction (jVN) with application to rapid 2D and 3D imaging. Magnetic Resonance in Medicine, 84(3), 1456–1469 (2020).

13. Kwon, K., Kim, D., Park, H., A parallel MR imaging method using multilayer perceptron. Medical Physics 44(12), 6209–6224 (2017).

14. Eo, T., Jun, Y., Kim, T., Jang, J., Lee, H. J., Hwang, D., KIKI-net: cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance images. Magnetic Resonance in Medicine, 80(5), 2188–2201 (2018).

15. Dar, S.U., Yurt, M., Shahdloo, M., Ildız, M.E., Tınaz, B., Cukur, T., Prior-guided image reconstruction for accelerated multi-contrast MRI via generative adversarial networks. IEEE Journal of Selected Topics in Signal Processing 14(6), 1072–1087 (2020).

16. Liu, F., Feng, L., Kijowski, R., MANTIS: Model-Augmented Neural neTwork with Incoherent k-space Sampling for efficient MR parameter mapping. Magnetic Resonance in Medicine, 82(1), 174–188 (2019).

17. Jalal, A., Arvinte, M., Daras, G., Price, E., Dimakis, A. G., Tamir, J., Robust Compressed Sensing MRI with Deep Generative Priors. Advances in Neural Information Processing Systems, 34, 14938–14954 (2021).

18. Chung, H., Ye, J. C., Score-based diffusion models for accelerated MRI. Medical Image Analysis, 80, 102479 (2022).

19. Xie, Y., Li, Q., Measurement-conditioned denoising diffusion probabilistic model for under-sampled medical image reconstruction. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VI. pp. 655–664.

20. Luo, G., Blumenthal, M., Heide, M., Uecker, M., Bayesian MRI reconstruction with joint uncertainty estimation using diffusion models. Magnetic Resonance in Medicine, 90(1), 295–311 (2023).

21. Gungor A, Dar, S.U.H., Ozturk, S, Korkmaz, Y., Bedel, H.A., Elmas, G., Ozbey, M., Cukur, T., Adaptive diffusion priors for accelerated MRI reconstruction. Medical Image Analysis 88:102872 (2023).

22. Cui, Z.X., Cao, C., Liu, S., Zhu, Q., Cheng, J., Wang, H., Zhu, Y., Liang, D., Self-score: Self-supervised learning on score-based models for MRI reconstruction. arXiv:2209.00835 (2022).

23. Yaman, B., Hosseini, S.A.H., Moeller, S., Ellermann, J., Ugurbil, K., Akcakaya, M. Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magnetic Resonance in Medicine 84(6), 3172–3191 (2020).

24. Desai, A. D., Ozturkler, B. M., Sandino, C. M., Boutin, R., Willis, M., Vasanawala, S., Hargreaves, B. A., Christopher, R., Pauly, J. M., Chaudhari, A. S., Noise2Recon: Enabling SNR-robust MRI reconstruction with semi-supervised and self-supervised learning. Magnetic Resonance in Medicine, 90(5), 2052–2070 (2023).

25. Korkmaz, Y., Dar, S.U.H., Yurt, M., Ozbey, M., Cukur, T., Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers. IEEE Transactions on Medical Imaging, 41(7), 1747–1763 (2022).

26. Uecker, M., Lai, P., Murphy, M. J., Virtue, P., Elad, M., Pauly, J. M., Vasanawala, S. S., Lustig, M., ESPIRiT--an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA. Magnetic Resonance in Medicine, 71(3), 990–1001 (2014).

27. Knoll, F., Zbontar, J., Sriram, A., Muckley, M.J., Bruno, M., Defazio, A., Parente, M., Geras, K.J., Katsnelson, J., Chandarana, H., Zhang, Z., Drozdzal, M., Romero, A., Rabbat, M., Vincent, P., Pinkerton, J., Wang, D., Yakubova, N., Owens, E., Zitnick, C.L., Recht, M.P., Sodickson, D.K., Lui, Y.W., fastMRI: A publicly available raw k-space and DICOM dataset of knee images for accelerated MR image reconstruction using machine learning. Radiology: Artificial Intelligence 2(1), e190007 (2020).

Figures

Figure 1. SSDiffRecon is a self-supervised diffusion model based on an unrolled transformer architecture. In forward steps, noise is added onto the zero-filled (ZF) reconstruction $x^{u,p}$ of a random subset of undersampled data $y_p$ selected via the mask $M_p$. Given the noisy sample $x_t^{u,p}$, the network predicts a `clean’ image $\hat{x}_0^u$ whose k-space data selected via $M_r$ is enforced to be consistent with the respective k-space data of the reference image. The reference is taken as the ZF reconstruction $x^u$ of the undersampled acquisition $y$.

Figure 2. Given undersampled acquisition $y$, SSDiffRecon initiates sampling with the ZF reconstruction $x_{ts}=\mathcal{C}^*\mathcal{F}^{-1}\{y\}$, and performs $ts=5$ reverse diffusion steps. The denoised image $x_{t-1}$ is sampled from the transition probability $p(x_{t-1}|x_{t})$ following a network forward-pass that estimates the `clean’ image $\hat{x}_0$ given $x_{t}$. For gradual enforcement of data fidelity, data-fidelity blocks receive acquired data with added noise ($\epsilon$) at a descending schedule towards lower $t$.

Figure 3. Reconstruction performance in (a) IXI and (b) fastMRI datasets for undersampled acquisitions at R=4. Average peak signal-to-noise ratio (PSNR, dB) and structural similarity (SSIM, %) metrics are given across the test sets. Results are listed for SSDiffRecon along with competing self-supervised baselines (self-DDPM, self-D5C5, self-rGAN). The top-performing method is marked in bold font for each reconstruction task and each metric.

Figure 4. Representative images for (a) a T₂-weighted acquisition in IXI and (b) a T₁-weighted acquisition in fastMRI. Reconstructions from competing methods (SSDiffRecon, self-DDPM, self-D5C5, self-rGAN) are shown along with the zero-filled reconstructions of undersampled data at R=4 (ZF), and reference images derived via Fourier reconstruction of fully-sampled data (REF). Images (top) and error maps (bottom; see colorbar) are given, and sample regions with notable differences among methods are annotated with circles.

Figure 5. Performance in ablation studies conducted on fastMRI at R=4. Average PSNR (dB) and SSIM (%) over tissue contrasts are listed across the test set. To assess self-supervised training, a supervised benchmark was formed by training SSDiffRecon on fully-sampled acquisitions. To assess the overall architecture, a UNet variant was formed. To assess data-fidelity blocks, a transformer variant w/o DF blocks was formed. To assess transformer blocks, a convolutional variant was formed by replacing self-attention with convolution layers. Top-performing variant is marked in bold.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

0806

DOI: https://doi.org/10.58530/2024/0806