4781

Deep Attention Unfolding CNN Architecture for Parallel MRI

Muhammad Shafique^1,2, Sohaib Ayyaz Qazi³, Faisal Najeeb¹, and Hammad Omer¹
¹Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan, ²Electrical Engineering, University of Poonch Rawalakot, Rawalakot AJK, Pakistan, ³Center for Medical Image Science and Visualization, Linköping University, Linköping, Sweden

Synopsis

Keywords: Image Reconstruction, Artifacts, Parallel MRI, Deep Learning, Attention Mechanism

Deep learning has made significant progress in recent years, however the local receptive field in CNN raises questions about signal synthesis and artefact compensation. This paper proposes a deep learning Spatial-Channel Attention U-Net (SCA-U-Net) to solve the folding problems arising in MR images because of undersampling. The main idea is to enhance local features in the images and restrain the irrelevant features at the spatial and channel levels. The output from SCA-U-Net is further refined by adding a small number of originally acquired low-frequency k-space data. Experimental results show a better performance of the SCA-U-Net model than classical U-Net model.

Introduction

Magnetic Resonance Imaging (MRI) is a demanded medical imaging technique because it provides high spatial resolution and fine contrast images of human organs. The prolonged scan time of MRI is one of its primary drawbacks. Undersampled measurements in Parallel MRI decrease scan time; however folding artefacts appear as a result.
Several deep-learning methods have been proposed in the literature to solve the folding problems in MR images. For instance, it is suggested to estimate the folding artefacts in aliased images using residual convolutional neural networks (CNN)¹. Also, CNN followed by the addition of a small number of low-frequency k-space data (originally acquired) has been proposed to solve the folding problems in MR images². Further, end-to-end fine-tuning has been used for CNN to solve the folding problems and data scarcity issues in MR images³.
U-Net is a CNN encoder and decoder network⁴that offers good learning capability with a small training data set. U-Net attains high performance in learning because of the efficient extraction of the local image features⁴. However, the convolution operators in U-Net have local receptive fields, so the processing of long-range dependencies goes through multiple convolutional layers, which may prevent learning about long-term dependencies⁵; and can hinder in integrating signals from a broad range of inputs.
One way to extend the receptive field of convolution operators is by employing a hierarchical network architecture⁶. In fact, integrating CNN with the attention mechanism⁷, which would detect long-range dependencies throughout the image region, has become popular. For example, the self-attention mechanism in CNN has been proposed for a reliable MR image reconstruction⁸. Sanghyun et.al⁹ introduced Convolutional-Block-Attention-Module (CBAM) to detect the long-range dependencies across the image region.
In this research, we propose additional channel and spatial attention mechanisms integrated into CNN (U-Net) for improved MR image reconstruction.

Method

In the proposed method i.e. SCA-U-Net, Spatial and Channel Attention mechanisms⁹ have been integrated into the U-Net architecture as shown in Fig.1.
The acquired undersampled data is initially preprocessed to produce zero-padded data by first adding zeros in the unmeasured areas. Next, the folded image is obtained by taking the inverse Fourier Transform (FT) (absolute value); and it is fed into the trained SCA-U-net to generate the output.
In SCA-U-Net, the encoder part encodes the input image into features at multiple different levels. The role of the decoder part is to gradually restore the details and spatial dimensions of the image according to the image features; as well as the fusion of high-level and low-level features. In addition, spatial attention and channel attention modules are integrated into U-Net as shown in red blocks in Fig.1.
The spatial attention module is added to the low-level feature maps since it mainly extracts the spatial features such as contours and edges. Similarly, the channel attention module is added as the last layer of the encoder, since the high-level feature maps mainly express complex features with a large receptive field and more channels. This mechanism allows the network to perform feature recalibration, through learning to exploit global information to selectively enhance useful features and restrain useless features. The structures of the channel attention module and spatial attention module as used in this paper are shown in Fig. 2.
The final output from SCA-U-Net is converted into k-space data by applying FT; a small number of low-frequency k-space data points are added from the originally measured k-space data to solve the localization uncertainty due to image folding. Finally, the solution image is obtained by applying inverse FT.

Results and Discussion

The details of training and test datasets are given in Table 1.
Training of the proposed SCA-U-Net involves the input and output images that are pairs of the FT of the subsampled and fully-sampled k-space data. The uniformly undersampled image domain human brain data set at Acceleration Factor (AF) = 2 is used as an input and the corresponding fully-sampled human brain data set is used as an output for training the network in our experiments.
In this paper, the training of the proposed SCA-U-Net is performed on Intel(R) core (TM) i7-4790 CPU, clock frequency 3.6GHz, 16 GB RAM, and GPU NVIDIA GeForce GTX 780 for approximately 3 hours. The libraries and hyper-parameters tuned for the network are shown in Table 2.
The reconstruction results obtained from the proposed SCA-U-Net are compared with the classic U-Net². The results are compared visually and in terms of different quantifying parameters i.e. Root-Mean-Square-Error (RMSE), Structured-Similarity-Index (SSI), Artefact-Power (AP)³ and image sharpness estimation¹¹ as shown in Fig.3.
The results show that the proposed SCA-U-Net provides significantly better reconstruction results as compared to the classic U-Net².

Conclusion

The proposed method i.e. SCA-U-Net improves the reconstruction outcome by taking the advantage of long-range dependencies across the image region. The improved image fidelity for uniformly undersampled MR images comes by incorporating the attention mechanism into U-Net. In future, the generalization capability of the trained SCA-U-Net can be analyzed for different anatomies and acceleration factors.

Acknowledgements

I am extremely obliged to my Ph.D. research supervisor Dr. Hammad Omer (Medical Image Processing Research Group (MIPRG) leader) for his constant guidance, positive denunciation, and cordial assistance at each stage of research. I would like to thank MIPRG, COMSATS University Islamabad, Pakistan, for providing me a superb environment and computational resources for my research work. I am also thankful to ISMRM for providing excellent platform for young researchers doing research in the field of Magnetic Resonance Imaging (MRI).

References

1. Z. Li et al., "Triple-D network for efficient undersampled magnetic resonance images reconstruction," Magnetic resonance imaging, vol. 77, pp. 44-56, 2021.

2. C. M. Hyun, H. P. Kim, S. M. Lee, S. Lee, and J. K. Seo, "Deep learning for undersampled MRI reconstruction," Physics in Medicine & Biology, vol. 63, no. 13, p. 135007, 2018.

3. M. Arshad, M. Qureshi, O. Inam, and H. Omer, "Transfer learning in deep neural network based under-sampled MR image reconstruction," Magnetic Resonance Imaging, vol. 76, pp. 96-107, 2021.

4. O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.

5. Y. Lu, Y. Zhao, X. Chen, and X. Guo, "A Novel U-Net Based Deep Learning Method for 3D Cardiovascular MRI Segmentation," Computational Intelligence and Neuroscience, vol. 2022, 2022.

6. M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, "Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation," arXiv preprint arXiv:1802.06955, 2018.

7. J.-B. Cordonnier, A. Loukas, and M. Jaggi, "On the relationship between self-attention and convolutional layers," arXiv preprint arXiv:1911.03584, 2019.

8. Y. Wu, Y. Ma, J. Liu, J. Du, and L. Xing, "Self-attention convolutional neural network for improved MR image reconstruction," Information sciences, vol. 490, pp. 317-328, 2019.

9. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.

10. A. Andreopoulos and J. K. Tsotsos, "Efficient and generalizable statistical models of shape and appearance for analysis of cardiac MRI," Medical image analysis, vol. 12, no. 3, pp. 335-357, 2008.

11. C. Feichtenhofer, H. Fassold, and P. Schallauer, "A perceptual image sharpness metric based on local edge gradient analysis," IEEE Signal Processing Letters, vol. 20, no. 4, pp. 379-382, 2013.

Figures

Fig. 1. The proposed Deep Attention Unfolding CNN Architecture for parallel MRI. The proposed SCA-U-Net is divided into four parts: encoder, decoder, spatial attention module, and channel attention module. Given an input feature map of size D×H×W, the output size of Block(x) is x×H×W where W is the width, H is the height of the feature map, D is the input channel number, x is the output channel number.

Fig. 2. Diagram of (a) The channel attention module utilizes both the max-pooling outputs and average-pooling outputs with a shared network; (b) The spatial attention module utilizes both the max-pooling outputs and average-pooling outputs that are pooled along the channel axis and forwards them to a convolution layer. F is the intermediate feature map, M_c is the 1D channel attention map, M_s is the 2D spatial attention map and F’ is the final refined output.

Table 1. The details of data sets for training and testing as used in the proposed SCA-U-Net

Table 2. The hyper-parameters and libraries used for training the proposed SCA-U-Net

Fig. 3. Reconstruction (unfolding) results of uniformly undersampled human brain data at AF=2 using the proposed SCA-U-Net Model and the Classic U-Net² Model.

Proc. Intl. Soc. Mag. Reson. Med. 31 (2023)

4781

DOI: https://doi.org/10.58530/2023/4781