Keywords: Machine Learning/Artificial Intelligence, Image Reconstruction
Vision transformers (ViT) are increasingly utilized in computer vision and have been shown to outperform CNNs in many tasks. In this work, we explore the use of Shifted Window (Swin) transformers for accelerated MRI reconstruction. Our proposed SwinV2-MRI architecture enables the use of multi-coil data and k-space consistency constraints with Swin transformers. Experimental results show that the proposed architecture outperforms CNNs even when trained on a limited dataset and without any pre-training.
1. Pruessmann, K. P., Weiger, M., Scheidegger, M. B. & Boesiger, P. SENSE: sensitivity encoding for fast MRI. Magn Reson Med 42, 952–962 (1999). PMID: 10542355.
2. Griswold, M. A., Jakob, P. M., Heidemann, R. M., Nittka, M., Jellus, V., Wang, J., Kiefer, B. & Haase, A. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn Reson Med 47, 1202–10 (2002). doi: 10.1002/mrm.10171. PMID: 12111967.
3. Uecker, M., Lai, P., Murphy, M. J., Virtue, P., Elad, M., Pauly, J. M., Vasanawala, S. S. & Lustig, M. ESPIRiT - An eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA. Magn Reson Med 71, 990–1001 (2014). doi: 10.1002/mrm.24751. PMID: 23649942.
4. Lustig, M., Donoho, D. & Pauly, J. M. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn Reson Med 58, 1182–1195 (2007). doi: 10.1002/mrm.21391. PMID: 17969013.
5. Lustig, M., Donoho, D. L., Santos, J. M. & Pauly, J. M. Compressed Sensing MRI. IEEE Signal Processing Magazine 72–82 (2008).
6. Hyun, C. M., Kim, H. P., Lee, S. M., Lee, S. & Seo, J. K. Deep learning for undersampled MRI reconstruction. Phys Med Biol 63, (2018). doi: 10.1088/1361-6560/aac71a.
7. Sriram, A., Zbontar, J., Murrell, T., Defazio, A., Zitnick, C. L., Yakubova, N., Knoll, F. & Johnson, P. End-to-end variational networks for accelerated MRI reconstruction. in International Conference on Medical Image Computing and Computer-Assisted Intervention 64–73 (Springer, 2020).
8. Souza, R., Ca, R. M., Lebel, R. M., Frayne, R. & Ca, R. A Hybrid, Dual Domain, Cascade of Convolutional Neural Networks for Magnetic Resonance Image Reconstruction. Proc Mach Learn Res 102, 437–446 (2019).
9. Rahman, T., Bilgin, A. & Cabrera, S. Asymmetric decoder design for efficient convolutional encoder-decoder architectures in medical image reconstruction. in Multimodal Biomedical Imaging XVII 11952, 7–14 (SPIE, 2022). doi: https://doi.org/10.1117/12.2610084.
10. Lin, K. & Heckel, R. Vision Transformers Enable Fast and Robust Accelerated MRI. in Medical Imaging with Deep Learning (2021).
11. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. Attention is all you need. in Advances in Neural Information Processing Systems 2017-December, 5998–6008 (2017).
12. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
13. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. & Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. in Proceedings of the IEEE International Conference on Computer Vision (2021). doi: 10.1109/ICCV48922.2021.00986.
14. Liu, Z., Hu, H., Lin, Y., et al. Swin transformer v2: Scaling up capacity and resolution. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 12009–12019 (2022).
15. Huang, J., Fang, Y., Wu, Y., Wu, H., Gao, Z., Li, Y., Ser, J. del, Xia, J. & Yang, G. Swin transformer for fast MRI. Neurocomputing 493, 281–304 (2022). doi: https://doi.org/10.1016/j.neucom.2022.04.051.
16. Liang, J., Cao, J., Sun, G., Zhang, K., van Gool, L. & Timofte, R. SwinIR: Image Restoration Using Swin Transformer. in Proceedings of the IEEE International Conference on Computer Vision 2021-October, (2021). doi: 10.1109/ICCVW54120.2021.00210.
17. Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. in International Conference on Medical image computing and computer-assisted intervention 234–241 (Springer, 2015).
18. Souza, R., Lucena, O., Garrafa, J., Gobbi, D., Saluzzi, M., Appenzeller, S., Rittner, L., Frayne, R. & Lotufo, R. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement. Neuroimage 170, 482–494 (2018). doi: 10.1016/j.neuroimage.2017.08.021. PMID: 28807870.
19. Uecker, M., Ong, F., Tamir, J. I., Bahri, D., Virtue, P., Cheng, J. Y., Zhang, T. & Lustig, M. Berkeley advanced reconstruction toolbox. in Proc. Intl. Soc. Mag. Reson. Med 23, (2015).
Figure 1: Proposed SwinV2-MRI framework for multi-channel MRI reconstruction with k-space data consistency using SwinV2IR. Multi-channel undersampled k-space data is converted to a complex image and patch-wise reconstructed using a SwinV2IR denoiser. The output patches are reassembled into an image, which undergoes k-space data consistency (see Figure 2). During training, the multi-channel output image is compared with the ground truth image to calculate the training loss.
Figure 3: SwinV2IR, Residual Swin Transformer Block (RSTB), and SwinV2 Transformer Layer (STL). The SwinV2IR network consists of initial and final convolutional stages encapsulating a number of RSTBs. Each RSTB consists of several STLs followed by a convolutional layer and a residual skip connection. Finally, each STL consists of a scaled cosine attention layer and a Multi-Layer Perceptron (MLP), each followed by a layer normalization stage and residual skip connection.
Figure 5: Boxplots showing PSNR comparison over 2247 slices of reconstructed test data obtained using the different methods. The red line denotes the median and the green circle denotes the mean of each distribution. The mean test PSNRs and standard deviations for the zero-filled, U-Net, and SwinV2-MRI reconstructions are 26.53 ± 1.709 dB, 32.25 ± 1.777 dB, and 32.79 ± 1.634 dB, respectively. A paired t-test showed statistical significance ($$$p$$$ < 0.01) between SwinV2-MRI and U-Net reconstruction PSNRs.