Keywords: Machine Learning/Artificial Intelligence, Image Reconstruction, Transformer, Multiple Instance Learning
Despite its proven clinical value, Diffusion-weighted Imaging (DWI) suffers from several technical limitations associated with prolonged echo trains in single-shot sequences. Parallel Imaging with sufficiently high under-sampling enabled by Deep Learning-based reconstruction may mitigate these problems. Newly emerged architectures relying on transformers demonstrated high performance in this context. This work aims at developing a transformer-based reconstruction method tailored to DWI by utilizing the availability of multiple image instances for a given slice. Redundancies are exploited by jointly reconstructing images using attention mechanisms which are performed across the set of instances. Benefits over reconstructing images separately from each other are demonstrated.1) Taouli, B., & Koh, D. M. (2010). Diffusion-weighted MR imaging of the liver. Radiology, 254(1), 47-66.
2) Maier, S. E., Sun, Y., & Mulkern, R. V. (2010). Diffusion imaging of brain tumors. NMR in biomedicine, 23(7), 849-864.
3) Le Bihan, D., & Iima, M. (2015). Diffusion magnetic resonance imaging: what water tells us about biological tissues. PLoS biology, 13(7), e1002203.
4) Pruessmann, K. P., Weiger, M., Scheidegger, M. B., & Boesiger, P. (1999). SENSE: sensitivity encoding for fast MRI. Magnetic Resonance in Medicine, 42(5), 952-962.
5) Griswold, M. A., Jakob, P. M., Heidemann, R. M., Nittka, M., Jellus, V., Wang, J., ... & Haase, A. (2002). Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magnetic Resonance in Medicine, 47(6), 1202-1210.
6) Muckley, M. J., Riemenschneider, B., Radmanesh, A., Kim, S., Jeong, G., Ko, J., ... & Knoll, F. (2021). Results of the 2020 fastmri challenge for machine learning MR image reconstruction. IEEE transactions on medical imaging, 40(9), 2306-2317.
7) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
8) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012-10022).
9) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
10) Lin, K., & Heckel, R. (2021). Vision Transformers Enable Fast and Robust Accelerated MRI. In Medical Imaging with Deep Learning.
11) Huang, J., Fang, Y., Wu, Y., Wu, H., Gao, Z., Li, Y., ... & Yang, G. (2022). Swin transformer for fast MRI. Neurocomputing, 493, 281-304.
12) Aggarwal, H. K., Mani, M. P., & Jacob, M. (2018). MoDL: Model-based deep learning architecture for inverse problems. IEEE transactions on medical imaging, 38(2), 394-405.
13) Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
14) Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., ... & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1874-1883).
15) Ilse, M., Tomczak, J., & Welling, M. (2018, July). Attention-based deep multiple instance learning. In International conference on machine learning (pp. 2127-2136). PMLR.
16) Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., & Teh, Y. W. (2019, May). Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning (pp. 3744-3753). PMLR.
17) Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586-595).
18) Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., & Smola, A. J. (2017). Deep sets. Advances in neural information processing systems, 30, 3391-3401.Figure 1: Overview of the regularizer architecture. It differs from a U-Net in that it employs Swin-transformer Blocks (STBs) for the encoding path instead of convolutions. Further, Patch Merging is used for down-sampling by a factor of 2 and doubling the number of feature channels. Following the first convolution layer of the network, feature maps are sub-divided into patches of size $$$4×4$$$ for compatibility with the STBs. After the last STB, the patch transform is reversed. In the decoder path, Pixel Shuffle modules are used for up-sampling the feature maps by factors of 2.