0379

Using a Video Diffusion Model-prior for reconstructing undersampled dynamic MR-data – An application to real-time cardiac MRI

Oliver Schad¹, Julius Frederik Heidenreich¹, Nils-Christian Petri², Bernhard Petritsch¹, and Tobias Wech^1,3
¹Department of Diagnostic and Interventional Radiology, University Hospital Würzburg, Würzburg, Germany, ²Department of Internal Medicine 1, University Hospital Würzburg, Würzburg, Germany, ³Comprehensive Heart Failure Center, University Hospital Würzburg, Würzburg, Germany

Synopsis

Keywords: AI Diffusion Models, Machine Learning/Artificial Intelligence

Motivation: MR-based “real-time” imaging of dynamic processes, as the beating heart, often depends on fast (undersampled) scans, which are subsequently reconstructed by algorithms exploiting prior knowledge. Spatio-temporal models describing the data in suboptimal manner can thereby lead to residual artifacts.

Goal(s): A high-quality model to regularize the reconstruction of real-time cardiac MRI based on undersampled spiral data acquisitions.

Approach: A video diffusion model was trained using cine videos in magnitude reconstruction and subsequently applied as a prior in a plug-and-play FISTA approach.

Results: Reconstructions of undersampled real-time frames with higher image quality than a low rank plus sparse approach.

Impact: We show the potential of probabilistic video diffusion models as a promising prior in iterative reconstructions of undersampled dynamic MR data. In our example, the approach enabled high quality real-time cardiac functional MRI in patients with arrhythmia.

Introduction

Recently, diffusion probabilistic priors were proposed to exactly model MR data, thereby allowing excellent regularization within reconstructions of undersampled scans$$$\,$$$[1,$$$\,$$$2]. While so far, mostly 2D applications in the spatial domain were proposed, we investigate the capability of 3D video diffusion models to support the reconstruction of undersampled dynamic spatio-temporal MR scans. The approach is tested in real-time cardiac imaging with spiral readouts.

Methods

Video Diffusion Model
A video diffusion model according to$$$\,$$$[3,$$$\,$$$4] was trained to estimate the data distribution of fully sampled cardiac cine series. To this end, a 3D-UNet with temporal attention was fitted to denoise cine sequences perturbed by Gaussian noise for a random diffusion timestep $$$t\,∈\,[1,1000]$$$. The training data consisted of 470 cine series in magnitude reconstruction, with FLASH- and bSSFP-contrast, spiral and Cartesian acquisitions and field-strengths including 0.55T, 1.5T and 3T$$$\,$$$(own data and data from$$$\,$$$[5]). Due to memory limitations (NVIDIA$$$\,$$$RTX$$$\,$$$A6000,$$$\,$$$48$$$\,$$$GB), training data were resized to dimensions of 224$$$\,$$$x$$$\,$$$224$$$\,$$$x$$$\,$$$20 frames. The obtained diffusion model was then capable of generating artificial samples from the trained data distribution$$$\,$$$(see Figs.$$$\,$$$1$$$\,$$$&$$$\,$$$2).

MRI reconstruction
Undersampled dynamic MR data were first subjected to a fixed number of 100 iterative SENSE$$$\,$$$[6] steps with no further regularization. Subsequently, 25 additional iterations were performed by enforcing both the physical measurement model, and, additionally applying the video diffusion model with $$$t\,=\,25→\,0$$$. $$$A$$$ transfers the spatio-temporal cine series $$$\tilde{x}$$$$$$\,$$$(2D$$$\,$$$x$$$\,$$$cardiac$$$\,$$$phases) into the corresponding temporal series of undersampled multi-coil k-spaces $$$\tilde{y}$$$$$$\,$$$(2D$$$\,$$$x$$$\,$$$coils$$$\,$$$x$$$\,$$$cardiac$$$\,$$$phases). The operator includes coil sensitivity maps, which were approximated by a temporal average image across all cardiac frames using ESPIRIT [7], the 2D Fourier transform as well as the undersampling mask of each frame. $$$D_t$$$ represents the diffusion step, producing a slightly denoised version of the real valued input spatio-temporal series. The initial phase of each image of the series $$$P_{\text{T}}$$$ was estimated from the phase of a temporal averaged reconstruction. The reconstruction process$$$\,$$$(schematically shown in$$$\,$$$Fig.$$$\,$$$1) was included into a FISTA-like fashion$$$\,$$$[8] and is outlined by the following pseudo-code:

Input:$$$\,\tilde{x}_{\text{T}}=s_{\text{T+1}}=recon_{\text{SENSE}},\,\,\lambda_{t}=\sigma_{t}=\frac{t-1}{T-1}\,,\,\,P_{\text{T}}=P_{\text{avg}},\,\,\gamma=1$$$
For$$$\,t=\text{T}...1:$$$
$$$\,\,\,\,\,\,\,\,$$$$$$s_{\text{t}}=D_{t}\,\tilde{x}_t$$$$$$\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$$(diffusion$$$\,$$$step)
$$$\,\,\,\,\,\,\,\,$$$$$$\tilde{x}_{t-1}\,=\,\lambda_t\,s_t\,\circ\,e^{i\,P_{\text{avg}}}\,+\,(1-\lambda_t)\,s_t\,\circ\,e^{i\,P_{t}}$$$$$$\,\,\,\,\,$$$(transforming$$$\,$$$real$$$\,$$$to$$$\,$$$complex)
$$$\,\,\,\,\,\,\,\,$$$$$$z_t\,=\,\tilde{x}_{t-1}+\gamma\,A^{*}\,(A\,\tilde{x}_{t-1}-\tilde{y})$$$$$$\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$$(data$$$\,$$$consistency)
$$$\,\,\,\,\,\,\,\,$$$$$$P_{t-1}\,=\,angle(z_t)$$$$$$\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$$(extracting$$$\,$$$phase$$$\,$$$information)
$$$\,\,\,\,\,\,\,\,$$$$$$\tilde{x}_{t-1}\,=\,real(z_{t}\,\circ\,e^{-i\,P_{t-1}})$$$$$$\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$$(mapping$$$\,$$$complex to$$$\,$$$real)
$$$\,\,\,\,\,\,\,\,$$$$$$\tilde{x}'_{t-1}\,=\,(1-\sigma_{t})\,s_{t}\,+\,\sigma_{t}\,s_{t+1}$$$$$$\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$$(FISTA-like$$$\,$$$weighting)
end
return$$$\,\tilde{x}'_0$$$

Evaluation

The proposed method was tested in real-time cardiac MRI at 1.5T$$$\,$$$(Siemens Magnetom Avanto^fit) in 3$$$\,$$$healthy volunteers and 5$$$\,$$$patients with intermittent atrial fibrillation. 13$$$\,$$$consecutive arms$$$\,$$$(R≈5) were acquired per frame (temporal resolution$$$\,$$$=$$$\,$$$48ms) using an in-house developed spiral bSSFP pulse sequence. For healthy volunteers, data were acquired for the duration of a breathhold of about 10s for 10-15 short axis slices. Trajectory patterns were rotated such that both the binning of real-time frames and the segmentation of fully sampled cine frames across several RR-intervals was possible. This allowed to simulate matching pairs of fully- and undersampled data of the same cardiac phase, which was used to calculate error maps for different reconstruction approaches$$$\,$$$(see Fig.$$$\,$$$3). For patients, the breathhold was shortened to 4.5s, delivering real-time depictions only. The non-Cartesian spiral data was gridded using GROG$$$\,$$$[9] and cropped to a matrix size of 224$$$\,$$$x$$$\,$$$224 in k-space (see section “Video Diffusion Model”) before subjecting it to the proposed reconstruction method. The corresponding spatial resolution was 3.2mm$$$\,$$$x$$$\,$$$3.2mm. For comparison, real-time frames were additionally reconstructed using iterative SENSE$$$\,$$$[6] only, as well as a low-rank plus sparse method$$$\,$$$[10]. Using a 5-point Likert scale, an expert reader rated the image quality of reconstructions of real-time acquisitions and corresponding ground truth cine series based on segmented Cartesian scans for all participants of the study.

Results

Fig.$$$\,$$$3 shows reconstructions of an undersampled spiral real-time frame using different methods in comparison with a corresponding fully sampled reference frame (segmented spiral cine in healthy volunteer). The proposed method based on the video diffusion model appears to result in the highest image quality with lowest artifact level and best SNR. The remaining artifacts seen in the images and error maps show noise-like behavior in the PI and video diffusion reconstruction, more pronounced for PI. Differences for LRS appear more structural overall. Fig.$$$\,$$$4 shows sample reconstructions of real-time frames using the proposed model for one healthy participant and three patients. The real-time frames reconstructions can correctly depict the movement of the heart, despite arrhythmia. The mean values from the expert reader study (Tab.$$$\,$$$1) confirm a robust performance for the proposed model. The reconstruction of 40 real-time frames using the new method took about 1min.

Discussion & Conclusion

We proposed a video diffusion model as a promising prior for high quality reconstructions of undersampled dynamic MR data. For real-time cardiac MRI, the method showed superior performance in comparison with a low rank plus sparse model. Transferring the method to spatial resolutions typical for cardiac MR exams represents one important aim of future work.

Acknowledgements

Presented work was partially funded by the Interdisciplinary Center for Clinical Research in Würzburg under Research Grant F-437. We thank Phil Wang for providing the implementation of “Video Diffusion Models” by J. Ho et al. (https://github.com/lucidrains/video-diffusion-pytorch) as well as the OCMR-Database community for openly sharing cardiac-MR data.

References

[1] H. Chung and J. C. Ye: Score-based diffusion models for accelerated MRI. arXiv, 16. Juli 2022. http://arxiv.org/abs/2110.05243

[2] G. Luo et al.: Bayesian MRI reconstruction with joint uncertainty estimation using diffusion models. Magn. Reson. Med. 2023, 90(1):295-311.

[3] J. Ho et al.: Video Diffusion Models. arXiv, 7. April 2022. https://arxiv.org/abs/2204.03458

[4] P. Wang (lucidrains): video-diffusion-pytorch. Github, 13. April 2022. https://github.com/lucidrains/video-diffusion-pytorch

[5] OCMR Dataset from www.ocmr.info

[6] O. Maier et al.: CG-SENSE revisited: Results from the first ISMRM reproducibility challenge. Magn. Reson. Med. 2021, 85(4):1821-1839.

[7] BART Toolbox for Computational Magnetic Resonance Imaging. Version v0.7.00. https://mrirecon.github.io/bart/.

[8] U. S. Kamilov et al.: Plug-and-Play Methods for Integrating Physical and Learned Models in Computational Imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine. 2023, 40(1):85-97.

[9] N. Seiberlich et al.: Non-Cartesian data reconstruction using GRAPPA operator gridding (GROG). Magn. Reson. Med. 2007, 58(6):1257-1265.

[10] R. Otazo et al.: Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components. Magn. Reson. Med. 2015, 73:1125–1136.

Figures

Figure 1: During training, a total of 470 magnitude cine series were perturbed with Gaussian noise to fit a 3D-UNet, representing the denoising prior of each timestep t in the diffusion chain. After training for 60.000 timesteps, artificial cine can be sampled from pure noise. To reconstruct undersampled real-time data, PI-reconstructions were used as an input for the PnP-FISTA model. Running the reverse diffusion process for the last 25 timesteps, while including data consistency terms, allows to reconstruct cine depictions with high quality.

Figure 2: Comparison between a real and an artificial cine series. The left cine was part of the training set during fitting of the diffusion model, whereas the right cine was sampled from pure Gaussian noise after training the model for 60.000 diffusion steps. The structural and temporal similarities indicate that the model presents a suitable prior for cardiac MR.

Figure 3: Fully sampled cine frame and corresponding undersampled (R≈5) reconstructions of one healthy volunteer. The difference images with respect to the reference were scaled by a factor x5 and highlight remaining artifacts.

Figure 4: Reconstructions of 40 real-time frames of one healthy volunteer (top left) and three patients using the proposed video diffusion model. The real-time acquisitions are able to correctly depict abnormal RR cycles.

Table 1: Mean values of image quality ratings on a Likert-scale of 1 (worst) to 5 (best) performed by an expert reader for the introduced reconstructions as well as Cartesian cine references (downsampled to the same resolution).

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

0379

DOI: https://doi.org/10.58530/2024/0379