Translating acquired sequences to missing ones in multi-contrast MRI protocols can dramatically reduce scan costs. Neural network models devised for this purpose are characteristically trained on paired datasets, which can be difficult to compile. Moreover, these models exclusively rely on convolutional operators with undesirable biases towards feature locality and spatial invariance. Here, we present a cycle-consistent translation model, ResViT, to enable training on unpaired datasets. ResViT combines localization power of convolution operators with contextual sensitivity of transformers. Demonstrations on multi-contrast MRI datasets indicate the superiority of ResViT against state-of-the-art translation models.
1. J. Denck, J. Guehring, A. Maier, and E. Rothgang, “Enhanced magnetic resonance image synthesis with contrast-aware generative adversarial networks,” Journal of Imaging, vol. 7, p. 133, 08 2021.
2. Wang G, Gong E, Banerjee S, Martin D, Tong E, Choi J, Chen H, Wintermark M, Pauly JM, Zaharchuk G. Synthesize High-Quality Multi-Contrast Magnetic Resonance Imaging From Multi-Echo Acquisition Using Multi-Task Deep Generative Model. IEEE Trans Med Imaging. 2020 Oct;39(10):3089-3099.
3. Lee, D., Moon, WJ. & Ye, J.C. Assessing the importance of magnetic resonance contrasts using collaborative generative adversarial networks. Nat Mach Intell 2, 34–42 (2020)
4. Kim S, Jang H, Hong S, Hong YS, Bae WC, Kim S, Hwang D. Fat-saturated image generation from multi-contrast MRIs using generative adversarial networks with Bloch equation-based autoencoder regularization. Med Image Anal. 2021 Oct;73:102198.
5. S. U. Dar, M. Yurt, L. Karacan, A. Erdem, E. Erdem, and T. Çukur, “Image synthesis in multi-contrast MRI with conditional generative adversarial networks,” IEEE Transactions on Medical Imaging, vol. 38, no. 10, pp. 2375–2388, 2019.
6.J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251, 2017.
7. Pan, Y., Liu, M., Lian, C., Zhou, T., Xia, Y., & Shen, D. (2018). Synthesizing Missing PET from MRI with Cycle-consistent Generative Adversarial Networks for Alzheimer's Disease Diagnosis. Medical image computing and computer-assisted intervention : MICCAI International Conference on Medical Image Computing and Computer-Assisted Intervention, 11072, 455-463
8. Chartsias, A., Joyce, T., Dharmakumar, R., Tsaftaris, S.A., 2017. Adversarial image synthesis for unpaired multi-modal cardiac data, in: Simulation and Synthesis in Medical Imaging. pp. 3–13.
9. H. Yang et al., "Unsupervised MR-to-CT Synthesis Using Structure-Constrained CycleGAN," in IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 4249-4261, Dec. 2020, doi: 10.1109/TMI.2020.3015379.
10. Y. Hiasa, Y. Otake, M. Takao, T. Matsuoka, K. Takashima, J. Prince, N. Sugano, and Y. Sato, “Cross-modality image synthesis from unpaired data using CycleGAN: Effects of gradient consistency loss and training data size,” ArXiv, abs/1803.06629, 2018.
11. Dalmaz, O., Yurt, M., & Çukur, T. ResViT: Residual vision transformers for multi-modal medical image synthesis. ArXiv, abs/2106.16031, 2021
12. J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” ArXiv, vol. abs/2102.04306,2021.
13. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, andN. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” ArXiv,vol. abs/2010.11929, 2021.
14. P. Isola, J .-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” CVPR, 201 7.
15. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings(Y. Bengio and Y. LeCun, eds.), 2015.
16. B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, and et al., “The multimodal brain tumor image segmentation benchmark (brats),”IEEE transactions on Medical Imaging, vol. 34, no. 10, pp. 1993–2024, 2015.
17. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,”Medical Image Computing and computer-Assisted Intervention – MICCAI 2015, May 2015.
18. O. Oktay, J. Schlemper, L. L. Folgoc, M. J. Lee, M. Heinrich, K. Misawa, K. Mori, S. G. McDonagh, N. Hammerla, B. Kainz, B. Glocker, and D. Rueckert, “Attention u-net: Learning where to look for the pancreas,”ArXiv, vol. abs/1804.03999, 2018.
19. H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in Proceedings of the 36th International Conference on Machine Learning(K. Chaudhuri and R. Salakhutdinov, eds.), vol. 97 of proceedings of Machine Learning Research, pp. 7354–7363, PMLR, 09–15 Jun 2019.
The unpaired translation method is based on two ResViT models with generators $$$G_1, G_2$$$ and two PatchGAN discriminators $$$D_1, D_2$$$. $$$G_2$$$ learns to translate a T1-weighted image to a T2-weighted image of the same cross-section that is indistinguishable from actual T2-weighted images from separate subjects, while $$$D_2$$$ learns to distinguish synthesized versus actual T2-weighted images. Likewise, $$$G_1$$$ learns to generate a T1-weighted image from a T2-weighted image, whereas $$$D_1$$$ learns to distinguish between synthetic and real T1-weighted images.