2811

MEDL: Unsupervised Multi-Stage Ensemble Deep Learning with Diffusion Models for Denoising MRI Scans

Sahil Vora¹, Riti Paul¹, Pak Lun Kevin Ding¹, Ameet C. Patel², Leland S. Hu², Yuxiang Zhou², and Baoxin Li¹
¹School of Computing and Augmented Intelligence (SCAI), Arizona State University, Tempe, AZ, United States, ²Department of Radiology, Mayo Clinic, Phoenix, AZ, United States

Synopsis

Keywords: Machine Learning/Artificial Intelligence, Machine Learning/Artificial Intelligence, Denoising, Unsupervised Learning

Motivation: Traditional MRI scans, necessary for high SNR and clear images, are time-consuming and discomfort for patients. Shorter scans, meant to improve the patient experience, often compromise image quality and SNR. New deep learning techniques provide a solution to denoise MRI scans, even with limited data availability.

Goal(s): We aim to create an unsupervised MRI denoising method for real-world clinical settings, eliminating the need for clean or paired noisy images ensuring versatility and practicality.

Approach: We use an unsupervised diffusion-based denoising approach to denoise MRI scans.

Results: We achieve unsupervised denoising for MRI scans, outperforming previous methods and reducing time to 6 seconds.

Impact: Our approach denoises general MRI scans without extra clean or noisy data. It's suitable for real-world clinics, reducing patient MRI time. It enhances imaging quality, ensuring accurate diagnoses and faster clinical practices for patients and doctors.

Introduction

Deep learning techniques find extensive application in computer vision and medical fields, showing promise in solving inverse problems like denoising. However, existing methods face limitations in denoising MRI scans. Our unsupervised Multi-Stage Ensemble Deep Learning (MEDL) model for general-purpose MRI scans overcomes these limitations by:

1. Eliminating Data Dependency: Our approach simplifies denoising by avoiding clean data (Supervised Learning) and paired noisy samples (Noise2Noise)¹. It trains using a single noisy scan, applicable to various clinical MRI types, not limited to Diffusion MRI (DDM²), enhancing real-world usability.²

2. Reducing Hallucinations in MRI Denoising: We prevent model hallucinations by generating additional noisy copies from the original image and updating the diffusion method's loss function. This is crucial with limited datasets, addressing challenges faced by previous methods (DDM²).

3. Statistical Integration for Multi-Stage Ensemble: The method integrates multi-stage results using effective statistical techniques. Preserving high-frequency details and avoiding smoothened or blurred outcomes, it achieves superior denoising in MRI image enhancement.

Methods

In this method, we design a multi-stage denoising network, as depicted in Figure 1, which targets different features to denoise in each stage. In the Stage 1 denoising step, we create multiple noisier copies of the noisy image (V₁). This leads us to initiate the Noise2Noise (N2N) training schedule that removes noise from low-frequency details and also learns the denoising function $$$ Φ $$$.

This denoising function is then used to predict the additive noise $$$ 𝜺 $$$. It leads to Stage 2, which finds an intermediate state in the Markov chain by state matching and Gaussian fitting through a statistical approach.

This matched state helps to initiate Stage 3, which denoises the MRI scan using DDPM that involves a diffusion model.³

Adapting the Diffusion MRI process to 2D/3D MRI posed challenges due to the need for multiple copies for comparison. Existing methods like DDM² relied on paired samples, making loss calculation easier with clean and noisy images. Attempts to use the noisier copy (X) for the loss function, as done in methods like Noisier2Noise^4,5, resulted in increased hallucinations and loss of details. We introduced a strategic change using the original noisy input (V₁) in the J Invariance loss function. This alteration effectively constrained hallucinations and preserved high-frequency details, significantly improving the results.

Updated Loss Equation: $$$ \mathcal{L}(\phi(\mathcal{X}),\mathcal{V}_1) = ||\phi(\mathcal{X})-\mathcal{V}_1||^2 \thickapprox ||\phi(\mathcal{X})-\mathcal{V}_0||^2 + const $$$

In our final Stage 4, we combined strengths from two stages: Stage 1, utilizing UNet, smoothed noise, and preserved constant structures, while Stage 3 enhanced high-frequency details and complex patterns. Inspired by the Multi-Objective theorem⁶, which involves making optimal decisions amid trade-offs between conflicting objectives and recognizing the complementary nature of these strengths, we integrated the outputs of both stages, leveraging their proficiencies. This approach allowed us to achieve a balanced denoising outcome, retaining essential structural information and highlighting fine details.

Results

Dataset(s) used:

NYU fastMRI⁷
1. Anatomy - Brain
2. Volumes Used: 205
3. Total Number of Images: 1635
4. Resolution: 320x320
5. Sequence: T1
6. Ground Truth Images (V₀): Reserved for Quantitative Analysis, Not Used in Training
7. V₁ Samples: Generated with Gaussian Noise ($$$\sigma$$$ = 1e^-5)
8. Input Prior Samples: Generated with Gaussian Noise ($$$sigma_g$$$ = 2e^-5)

Evaluation:

Quantitative Results: Table 1 presents average PSNR and SSIM⁸ results for 100 random samples from the fastMRI dataset, compared with N2N and DDM², leveraging ground truth data for evaluation.
Qualitative Results: We showcase progressive denoising in each stage of our approach, illustrating the qualitative improvements

Discussion

Our model significantly outperforms others on the brain dataset (Figure 2). Compared to Noisy Input (V1), it exhibits a 2.1% increase in PSNR and a 5% SSIM improvement. Remarkably, our approach shows a 23% rise in PSNR and 34% in SSIM over Noisier Input Priors (X).
As seen in Figures 2 & 3, qualitative results also show that our results preserve details that previous models failed to achieve. We also observe that the combined results accomplish the multi-objective task of denoising both high and low-frequency details, making the approach extensible for clinical real-time data without requiring any ground truth information.
Additionally, our method significantly improves denoising efficiency, processing a single 2D MRI scan in just 6 seconds—crucial for time-sensitive clinical practice.

Conclusion

In this study, we introduced an unsupervised model, eliminating the need for clean or paired noisy MRI scans for denoising through deep learning. MEDL outperforms previous approaches on real-world clinical scans, preserving both high and low-frequency data with a multi-stage ensemble. Future work involves optimizing ensemble model parameters for automation and evaluating results across different anatomies, exploring transfer learning possibilities.

Acknowledgements

No acknowledgement found.

References

Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., & Aila, T. (2018). Noise2Noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189.
Xiang, T., Yurt, M., Syed, A. B., Setsompop, K., & Chaudhari, A. (2023). DDM²: Self-Supervised Diffusion MRI Denoising with Generative Diffusion Models. arXiv preprint arXiv:2302.03018.
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.
Moran, N., Schmidt, D., Zhong, Y., & Coady, P. (2020). Noisier2noise: Learning to denoise from unpaired noisy data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12064-12072).
Xu, J., Huang, Y., Cheng, M. M., Liu, L., Zhu, F., Xu, Z., & Shao, L. (2020). Noisy-as-clean: Learning self-supervised denoising from corrupted image. IEEE Transactions on Image Processing, 29, 9316-9329.
Miettinen, K. (1999). Nonlinear multiobjective optimization (Vol. 12). Springer Science & Business Media.
Zbontar, J., Knoll, F., Sriram, A., Murrell, T., Huang, Z., Muckley, M. J., ... & Lui, Y. W. (2018). fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv preprint arXiv:1811.08839.
Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003, November). Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003 (Vol. 2, pp. 1398-1402). Ieee.

Figures

Figure 1: Schematic representation of the architecture for the proposed Multi-Stage Deep Learning (MEDL) method. A multi-stage denoising model is devised, where each stage targets different features for denoising. These stages are then merged statistically using the static parameter wavg. Importantly. The proposed model does not require any clean images during training and can start the experiment directly from Noisy Input Sample V1.

Figure 2: Results of previous methods and current approach on NYU Dataset using PSNR and SSIM with respect to the available ground truth clean images. While N2N cleans data but misses high-frequency details, DDM², on the other hand, smoothens the background extensively. The proposed method MEDL result preserves these details and is closest to the available ground truth scan. Top: Highlighted models for comparison. Bottom: Magnified view of the target area.

Figure 3: Stage-wise results of the proposed method on fastMRI BrainDataset. Stage 1 enhances low-frequency details, boosting quantitative scores, though some visible noise persists. Stage 3 removes noise entirely but smoothes the background. The combined ensemble result preserves crucial details from both stages, producing an image closest to ground truth. Top: Highlighted stages leading to final results. Bottom: Magnified view of the target area.

Table 1: Quantitative Performance Results for FastMRI Brain (T1) dataset for 100 2-D MRI scans. Red and blue cells represent the best and second best-performing models, respectively. PSNR and SSIM are calculated with respect to Ground Truth.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

2811

DOI: https://doi.org/10.58530/2024/2811