Keywords: AI/ML Image Reconstruction, Machine Learning/Artificial Intelligence
Motivation: Investigate utility of self-attention deep learning to exploit global temporal information in motion-resolved 4D MR imaging.
Goal(s): Design a novel hybrid convolutional-attention network to reconstruct motion-resolved 4D images without explicit k-space data consistency.
Approach: A hybrid Unet-style 4D reconstruction network was developed to incorporate windowed multiscale spatiotemporal multihead self-attention. Training and testing were performed on free-breathing data acquired on patients with abdominal tumors.
Results: Spatiotemporal attention successfully captured motion in multiple dimensions with improved image quality relative to state-of-the-art XD-GRASP reconstruction.
Impact: Self-attention deep learning mechanism can combine long-range spatial learning and global temporal learning to augment capabilities of convolutional networks for improved motion-resolved 4D MRI of mobile tumors.
[1] Coppo S, Piccini D, Bonanno G, et al. Free-running 4D whole-heart self-navigated golden angle MRI: initial results. Magn Reson Med. 2015;74:1306-1316. doi:10.1002/mrm.25523
[2] Feng L, Delacoste J, Smith D, et al. Simultaneous evaluation of lung anatomy and ventilation using 4D respiratory-motion-resolved ultrashort echo time sparse MRI. J Magn Reson Imaging. 2019;49:411-422. doi:10.1002/jmri.26245
[3] Stemkens B, Paulson ES, Tijssen RHN. Nuts and bolts of 4D-MRI for radiotherapy. Phys Med Biol. 2018;63:21TR01. doi:10.1088/1361-6560/aae56d
[4] Otazo R, Lambin P, Pignol JP, et al. MRI-guided radiationtherapy: an emerging paradigm in adaptive radiation oncology. Radiology. 2021;298:248-260. doi:10.1148/radiol.2020202747
[5] Küstner T, Fuin N, Hammernik K, et al. CINENet: deeplearning-based 3D cardiac CINE MRI reconstruction with multi-coil complex-valued 4D spatio-temporal convolutions. SciRep. 2020;10:13710. doi:10.1038/s41598-020-70551-8
[6] Freedman JN, Gurney-Champion OJ, Nill S, et al. Rapid 4D-MRI reconstruction using a deep radial convolutional neural network: Dracula. Radiother Oncol. 2021;159:209-217. doi:10.1016/j.radonc.2021.03.034
[7] Murray V, Siddiq S, Crane C, et al. Movienet: Deep space–time-coil reconstruction network without k-space data consistency for fast motion-resolved 4D MRI. MagnReson Med. 2023;1-15. doi: 10.1002/mrm.29892
[8] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv:201011929 [cs]
[9] Liu Z, Lin Y, Cao Y, et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:210314030 [cs]
[10] Wang Z, Cun X, Bao J, Zhou W, Liu J, Li H. Uformer: a general U-shaped transformer for image restoration. arXiv:210603106 [cs]
[11] Feng L, Axel L, Chandarana H, Block KT, Sodickson DK, Otazo R. XD-GRASP: golden-angle radial MRI with reconstruction of extra motion state dimensions using compressed sensing. Magn Reson Med. 2016;75:775-788. doi:10.1002/mrm.25665
[12] Park N, Kim S. How Do Vision Transformers Work? arXiv:220206709 [cs]
Figure 1: Overall workflow. Continuous radial golden-angle stack-of-stars acquisitions are transformed into 10 aliased motion-resolved images and reconstructed by the network. The coil and motion state dimensions are combined to let the network learn spatiotemporal features. Movieformer is inspired by the Unet style of Movienet with windowed transformer blocks added at every internal resolution. The input to the network uses 900 spokes and the reference for training uses 1,800 spokes, resulting in 2-fold acceleration
Figure 2: Movieformer architecture. (a) Modified Movienet Adapted Residual Block that processes data in 2 concurrent streams whose combination serve as input to a series of LeWin transformer blocks. These blocks increase the spatial receptive field of view across all temporal frames. (b) Details of LeWin transformer blocks which perform multihead self-attention in small nonoverlapping windows. (c) Uformer’s Locally-enhanced Feed-Forward Network (LeFF) retains 2D locality in attention-based features.
Figure 3: Comparison of Movieformer reconstruction against reference XD-GRASP reconstruction for a patient with a kidney cyst. Network is trained on 2D+motion axial slices and is still able to resolve 3D motion. Movieformer presents reduced streaking artifacts while preserving motion with respect to XD-GRASP.
Figure 4: Comparison of Movieformer reconstruction against reference XD-GRASP reconstruction for patient with liver metastasis. Movieformer presents improved motion visualization in the sagittal and coronal planes with respect to XD-GRASP.