1957

Transformer residual cross (T-REX) networks for volumetric super-resolution.

James Grover^1,2, Shanshan Shan^1,3, Paul Keall^1,2, and David E.J. Waddington^1,2
¹Image X Institute, The University of Sydney, Sydney, Australia, ²Ingham Institute for Applied Medical Research, Sydney, Australia, ³State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China

Synopsis

Keywords: Other AI/ML, Machine Learning/Artificial Intelligence

Motivation: Higher temporal resolution is needed for many MRI-guidance applications. Reducing matrix sizes can increase temporal resolution at the cost of lower spatial resolution. Deep learning-based super-resolution could mitigate the trade-off in spatiotemporal resolution.

Goal(s): Develop and evaluate a unified deep learning-based algorithm that can up-sample thick single-slice low spatial resolution MRI to a thin multi-slice high spatial resolution MRI.

Approach: Developed a transformer residual cross (T-REX) neural network that simultaneously increased the spatial resolution and decreased the slice thickness providing high spatial resolution multi-slice MRI.

Results: T-REX was successfully trained and evaluated showing promising results for a variety of field strengths and sequences.

Impact: The ability to acquire high spatial resolution volumetric MRI quicky has applications to low-field MRI and MRI-guided radiation therapy. Here, we present our initial findings of a unified neural network that applies volumetric super-resolution.

Introduction

MRI acquisition time can be reduced by sampling fewer points in k-space, which is typically achieved via under-sampling or reductions in matrix size. However, both approaches can lead to lower image quality, via reductions in signal-to-noise ratio and/or reduced spatial resolution. This trade-off is amplified in the growing fields of low-field MRI¹ (where low signal-to-noise per unit time causes prolonged scan times) and MRI-guided radiation therapy² (where real-time, low latency, cine-MRI is required for treatment adaptation).
Deep learning has been used in MRI image enhancement, including super-resolution – where low spatial resolution images are up-sampled to a high spatial resolution. Here, we present the transformer residual cross (T-REX) network, a unified transformer-based neural network that applies super-resolution to up-sample a low spatial resolution thick slice MRI to a high spatial resolution thin multi-slice MRI.

Methods

T-REX comprises of a convolutionally-based encoding arm for feature extraction and compression to encode a high dimensionality to a low dimensionality. Once low dimensionality is attained, this representation is passed through a multi-layer transformer encoder. The output of the transformer encoder is then provided to a convolutionally-based decoding arm for further feature extraction and up-sampling to the desired dimensionality. Residual blocks, between encoding and decoding arms, comprising of convolutions and skip connections extract and retain high frequency features. A basic overview of T-REX is provided in Figure 1.
UPENN-GBM³ from the Cancer Imaging Archive⁴ was used to train and evaluate T-REX in this study. 40 unstripped volumetric brain MRIs (corresponding to 40 patients) acquired at 3.0 T were used as training data. Two test datasets (10 patients each) were also derived for 3.0 T and 1.5 T to assess the generalisability of the model to lower field strengths. The normalised (min-max) root mean-square-error (NRMSE) was computed between each reconstructed patient volume and the reference image. This workflow was conducted twice for T1w and T2w datasets (i.e., a T1w and T2w T-REX models were trained and evaluated on 3.0 T and 1.5 T data).
Reference high spatial resolution volumetric brain MRIs were processed to produce synthetic inputs for T-REX. In-plane (axial) spatial resolution was decreased through central cropping of k-space. Additive Gaussian noise was added to these low spatial resolution slices (only during model training) to simulate low-field acquisitions. The slice thickness was increased by taking an average down-sampled slice for a neighbourhood of slices. The in-plane up-sampling factor was 4× while the slick thickness decrease factor was 5×. For example, a 4×4×5 mm³ voxel was up-sampled to an isotropic 1 mm³. During model training, the thick slices were overlapped to increase the training data. This pipeline is shown in Figure 2. For model testing, each thick slice generated 5 thin slices that were stacked to encompass the total field-of-view of the reference MRI for a direct volumetric comparison.

Results

T-REX was successfully trained on T1w and T2w images creating both a T1w and T2w model. Qualitatively, T-REX recovered high frequency features as shown in Figure 3 and 4. Additionally, T-REX generalised well to lower (1.5 T testing data vs 3.0 T training data) field strengths (Figure 4). The NRMSE values across the 10 testing patients for both sequences and field strengths are presented in figure 5.

Discussion

T-REX was successful at the two distinct simultaneous tasks of decreasing slice thickness and increasing in-plane spatial resolution in a unified neural network. Although trained on 3.0 T MRI, it could generalise well to 1.5 T (for a variety of MRI scanners) indicating transferability to low-field MRI and MRI-guided radiation therapy. Further work will be conducted to improve model performance, especially in recovering thin slices from thick slices for 1.5 T as noticeable artifacts were still present using T-REX (Figure 4 coronal view). Additionally, further work to experiment with T-REX to different anatomical sites will be undertaken, especially sites with high motion (e.g., liver) for the downstream use of MRI-guided radiation therapy.

Conclusion

We present T-REX, a transformer residual cross network for volumetric super-resolution. We trained T-REX on 3.0 T and evaluated on 1.5 T and 3.0 T. Error metrics remained low and stable even when evaluating on MRI data acquired at half the field strength the model was trained on. These promising initial results show that T-REX can be applied to low-field MRI and MRI-guided radiation therapy.

Acknowledgements

This work is supported by an Australian Government Research Training Program (RTP) Scholarship and a NHMRC Investigator Grant.

References

1. Arnold TC, Freeman CW, Litt B, Stein JM. Low‐field MRI: Clinical promise and challenges. Journal of Magnetic Resonance Imaging. 2023;57(1):25-44.

2. Keall PJ, Brighi C, Glide-Hurst C, et al. Integrated MRI-guided radiotherapy—opportunities and challenges. Nature Reviews Clinical Oncology. 2022:1-13.

3. Bakas S, Sako C, Akbari H, et al. The University of Pennsylvania glioblastoma (UPenn-GBM) cohort: Advanced MRI, clinical, genomics, & radiomics. Scientific data. 2022;9(1):453.

4. Clark K, Vendt B, Smith K, et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. Journal of digital imaging. 2013;26(6):1045-1057.

Figures

Figure 1: T-REX architecture. Feature extraction and compression encodes an input high dimensionality tensor to a low dimension through convolution and down-sampling operations in the encoding arm. A multi-layer transformer encoder operates on a latent low dimensionality tensor from the encoding arm. Further feature extraction and up-sampling takes place in the decoding arm. Residual blocks extract and preserve (through skip connections) high frequency features between encoding and decoding arms. T-REX: transformer residual cross network.

Figure 2: Label and input data of T-REX. Brain MRIs are 4× down-sampled via central cropping of k-space to produce a low spatial resolution single slice. These single slices are subsequently fused together to produce 5× thicker slices. a) Label data with overlayed down-sampling grid. b) Input data produced from the label data (additive Gaussian noise was added to simulate lower SNR in low-field and MRI-guided radiation therapy applications). Goal of T-REX is to recover a) provided b). T-REX: transformer residual cross network. SNR: signal-to-noise ratio.

Figure 3: T-REX on T1w 3.0 T MRI. Input low spatial resolution thick slices were input to T-REX producing a high spatial resolution multi-slice prediction. These were compared to the reference high spatial resolution thin slice MRI.

Figure 4: T-REX on T2w 1.5 T MRI. Input low spatial resolution thick slices were input to T-REX producing a high spatial resolution multi-slice prediction. These were compared to the reference high spatial resolution thin slice MRI.

Figure 5: Quantitative metrics for T-REX. The normalised (min-max) root mean-square-error (NRMSE) metric is provided for both sequences (i.e., T1w and T2w models) for both field strengths. Results are the average and standard deviation for 10 testing patients from each category.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1957

DOI: https://doi.org/10.58530/2024/1957