1368

Noise-Robust Reconstruction for Accelerated MRI using Contrastive Learning

Seonghyuk Kim¹, Sung-Hong Park¹, and HyunWook Park¹
¹KAIST, Daejeon, Korea, Republic of

Synopsis

Keywords: Image Reconstruction, Image Reconstruction, Noise-robust method

Motivation: Deep learning-based accelerated MRI reconstruction methods have shown outstanding performance but do not consider noise. Corruption due to noise may lead to wrong diagnosis in clinical practices.

Goal(s): Propose a noise-robust reconstruction method, which reconstructs noise-free full-sampled images from noisy undersampled data.

Approach: A noise-robust reconstruction method is proposed using contrastive learning framework consisting of two stages. The first stage extracts feature representations related to the noise level, which is used in the second stage to reconstruct alias-free image.

Results: Experiment results show that the proposed method provides robust reconstruction with limited training data, yielding superior image reconstruction compared to other reconstruction methods.

Impact: The encoder trained in the first stage extracts representation features that contain content-invariant noise level information. Therefore, the trained encoder can be applied to other downstream tasks with limited amount of training data.

Introduction

Accelerated MRI method can reduce scan time by utilizing multichannel k-space data with sparse sampling. In the past few years, deep learning-based accelerated MRI reconstruction methods^1-3 have shown outstanding performance in terms of computation time as well as image quality. However, conventional deep learning methods are not robust to noise in the acquired data. To address this issue, a noise-robust reconstruction method is proposed, which uses a contrastive learning framework⁴. The encoder in the first stage is trained to extract representation features containing information about noise level in the input image. This is followed by the reconstruction network in the second stage, which takes noisy undersampled image and the extracted features from the first stage as inputs to reconstruct alias-free image. Not only does the proposed method reconstruct higher quality images with well-preserved details than the baseline models, but also achieves superior quantitative results with small training data.

Method

A noise-informed reconstruction network is proposed, which reconstructs a full-sampled image from undersampled multi-channel images corrupted by noise. The framework consists of two stages as illustrated in Fig.1.
For contrastive learning in the first stage, the training data are divided into three groups depending on the noise level. In the training process, an image from one group is selected and randomly cropped, generating the query and positive key. Then, an image from the other groups is selected and randomly cropped, generating two negative keys. The query patch goes through the query encoder, whereas the key patches are fed into the key encoder. The encoders are followed by the projection head, which embeds the extracted features into representation vectors. The overall training process is illustrated in Fig.2.
The training is done with contrastive loss, expressed as follows:
$$L_{c}=\sum^{N_{batch}}_{i=1}-\log\frac{\text{exp}\left(\textbf{F}^{q}_{i}\cdot\textbf{F}^{k+}_{i}/\tau\right)}{\sum^{2\times N_{neg}}_{j=1}\text{exp}\left(\textbf{F}^{q}_{i}\cdot\textbf{F}^{k-}_{j}/\tau\right)}\text{,}\quad\quad[1]$$
where $$$N_{batch}$$$ is the batch size, $$$N_{neg}$$$, $$$\textbf{F}^{q}_{i}$$$ is the query representation vector, $$$\textbf{F}^{k+}_{i}$$$ is the positive key representation vector, $$$\textbf{F}^{k-}_{j}$$$ is the negative key representation vector, and $$$\tau$$$ is the temperature paramter.
In the second stage, the trained query encoder from the first stage is employed to produce representation features consisting of information about the noise level. The reconstruction network combines the features from two distinct networks to reconstruct full-sampled images. To reduce the feature domain gap between two networks, the encoder in the reconstruction network utilizes the combined pixel and channel attention (CPCA) blocks⁴ after the feature concatenation. Moreover, deformable convolution⁵ is applied to focus on the interested region and to provide larger receptive fields. The overall process of the second stage and reconstruction network architecture are depicted in Fig.3.
The second stage is trained with supervised learning method with L1 loss between the network output and the ground truth:
$$L_{R}=\frac{1}{N_{recon}}\sum^{N_{recon}}_{i=1}\left|rSOS\left(f\left({\bf{y}}_{i}\right)\right)-x_{i}\right|\text{,}\quad\quad[2]$$
where $$$N_{recon}$$$ is the total number of training data, $$$rSOS\left(\cdot\right)$$$ is the root-sum-of-squares operation, and $$$f\left(\cdot\right)$$$ is the reconstruction network.
We used the multi-coil MR data from the FastMRI dataset⁶. The images were undersampled with 1D Cartesian uniform mask with 16 ACS lines and acceleration rate of 4. Random Gaussian noise was added to real and imaginary channels in each coil with a standard deviation of $$$\sigma$$$ expressed as follows:
$$\sigma=\left[\text{max}\left(x\right)-\text{min}\left(x\right)\right]\times\alpha\text{,}\quad\quad[3]$$
where $$$x$$$ is the full-sampled image reconstructed with root-sum-of-squares operation. The datasets in this work were generated to have three noise levels, each level with $$$\alpha$$$ value of 0, 0.01, and 0.02 in Eq.[3].

Results

The proposed method provided superior reconstruction from noisy undersampled image compared to GRAPPA⁷, OT-cycleGAN³ and residual U-net, as shown in Fig.4.
In case of $$$\alpha=0$$$, OT-cycleGAN, which was more optimized to the reconstruction of images without added noise, produced larger errors in the cerebrospinal fluid area than the proposed method despite higher PSNR and SSIM values. In case of higher noise levels $$$\left(\alpha=0.01, 0.02\right)$$$, all comparison methods failed to reconstruct clean images, whereas the proposed method provided higher robustness towards noise than the other methods.
The results for reconstruction of in vivo images are shown in Fig.5. As the slice thickness becomes thinner, signal-to-noise ratio gets lower, and the proposed method provides more robust reconstruction than the residual U-net.

Discussion and Conclusion

In this work, we proposed a noise-robust reconstruction method using contrastive learning framework. Through contrastive learning, the query encoder in the noise level feature extraction stage learns features representing the amount of noise corruption, unrelated to the image content. Those features are utilized in the reconstruction stage for the noise-informed network to provide noise-robust reconstruction with limited number of paired data. Furthermore, the representation features extracted from the trained encoder contain content-invariant noise level information, thus can be applied to MR image reconstruction tasks of various datasets.

Acknowledgements

This work was supported by the Korea Medical Device Development Fund grant funded by the Korea government (Project Number: 1711138003).

References

[1] Sriram, Anuroop, et al. "End-to-end variational networks for accelerated MRI reconstruction." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2020.

[2] Oh, Gyutaek, et al. "Unpaired deep learning for accelerated MRI using optimal transport driven CycleGAN." IEEE Transactions on Computational Imaging 6 (2020): 1285-1296.

[3] Yaman, Burhaneddin, et al. "Self-supervised physics-based deep learning MRI reconstruction without fully-sampled data." 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020.

[4] Jiang, Bo, et al. "Multilevel Noise Contrastive Network for Few-Shot Image Denoising." IEEE Transactions on Instrumentation and Measurement 71, 1-13, 2022.

[5] Dai, Jifeng, et al. "Deformable convolutional networks." Proceedings of the IEEE international conference on computer vision. 2017.

[6] Zbontar, Jure, et al. "fastMRI: An open dataset and benchmarks for accelerated MRI." arXiv preprint arXiv:1811.08839, 2018.

[7] Griswold, Mark A., et al. "Generalized autocalibrating partially parallel acquisitions (GRAPPA)." Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 47.6, 1202-1210, 2002.

Figures

Overall framework of Noise-informed Reconstruction Network. The framework consists of two stages: the first stage of encoder that extracts features containing noise corruption information and the second stage of reconstruction network that utilizes rich representation features from the first stage to reconstruct noise-free images from noisy undersampled images.

Training process of the first stage. An image from one group is selected and randomly cropped to size of $$$256\times 256$$$, which becomes the query and positive key. An image from another groups is selected and randomly cropped, generating two negative keys. The query patch goes through the query encoder, whereas the key patches go through the key encoder. The encoders are followed by the projection head, which embeds the extracted features into representation vectors.

(a) Overall process of noise-informed reconstruction. The trained query produces representation features consisting of information about the noise level. The reconstruction network then combines the features from two encoders to reconstruct images. (b) Network architecture of query encoder and reconstruction network. The reconstruction network contains CPCA blocks and deformable convolution implemented in residual U-net structure to boost reconstruction performance.

Noisy undersampled input with a reduction factor of 4 (column 1), reconstructed image from GRAPPA (column 2), OT-cycleGAN (column 3), residual U-net (column 4), proposed method (column 5), and ground truth image (column 6) when noise level is $$$\sigma=0$$$ (row 1), $$$\sigma=0.01$$$ (row 3), and $$$\sigma=0.02$$$ (row 3) with corresponding difference maps (row 2, row 4, and row 6). The numbers in the index of the column indicate number of training data for each model, and the numbers in each image are PSNR and SSIM.

Noisy undersampled in vivo images with a reduction factor of 4 (column 1), reconstructed images from residual U-net (column 2), and from the proposed method (column 3) with different slice thickness (row 1 and row 2). The proposed method reconstructs full-sampled image with superior quality than the residual U-net, especially for case with low SNR due to smaller slice thickness.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1368

DOI: https://doi.org/10.58530/2024/1368