Oscar van der Heide1,2, Stijn Heldens3, Alessio Sclocco3, Hongyan Liu1,2, Miha Fuderer1,2, Cornelis A.T. van den Berg1,2, and Alessandro Sbrizzi1,2
1Computational Imaging Group for MR Diagnostics & Therapy, Center for Image Sciences, University Medical Center Utrecht, Utrecht, Netherlands, 2Department of Radiology, Division of Imaging and Oncology, University Medical Center Utrecht, Utrecht, Netherlands, 3Netherlands eScience Center, Amsterdam, Netherlands
Synopsis
Keywords: Quantitative Imaging, Quantitative Imaging
Motivation: MR-STAT is a transient-state qMRI technique that can deliver T1, T2 and proton density maps from a single short scan, even when Cartesian trajectories are used. The non-linear model-based MR-STAT reconstruction procedure is challenging and current state-of-the-art reconstructions take multiple hours for a 3D dataset.
Goal(s): In this work we aim to accelerate the MR-STAT reconstructions to make the technique more valuable for clinical practice.
Approach: A highly-optimized CUDA EPG simulator was developed and an optimization for signal computations for Cartesian trajectories has been discovered.
Results: High-resolution 3D knee and brain MR-STAT datasets can now be reconstructed in 12.5 and 25 minutes, respectively.
Impact: The tools are part of the COMPAS Toolkit that is available online. These tools are relevant to other researchers or educators that require high-performance Bloch/MRI simulations.
Introduction
MR-STAT1 is a transient-state multi-parametric qMRI technique that can deliver T1, T2 and proton density maps from a single short acquisition. However, MR-STAT is known for its computationally intensive image reconstructions.2 Previous state-of-the-art MR-STAT techniques resulted in reconstruction times of 5 hours and 15 hours for 3D knee4 and full brain datasets, respectively. In this work present two acceleration strategies that are part of a highly optimized GPU library for qMRI (COMPAS Toolkit) to reduce MR-STAT reconstruction times to 12.5 minutes (knee) and 25 minutes (brain).Theory
In MR-STAT, quantitative parameter maps $$$\boldsymbol{\alpha}$$$ (e.g. $$$T_1, T_2, \rho$$$) are directly fitted to the measured MR signal using the volumetric signal model $$s(t; \boldsymbol{\alpha}, \mathbf{C}) = \sum_{j=1}^{N_v} \mathbf{C}_j m(\boldsymbol{\alpha}_j, t)e^{-2\pi i \mathbf{k}(t) \cdot \mathbf{r}_j}.$$ Here $$$N_v$$$ is the number of voxels, $$$\mathbf{C}_j$$$ the coil sensitivity in voxel $$$j$$$, $$$\boldsymbol{\alpha}_j$$$ the tissue parameters of interest in voxel $$$j$$$, $$$\mathbf{k}$$$ the k-space trajectory and $$$\mathbf{r}$$$ the spatial coordinates of voxel $$$j$$$. An inexact Gauss-Newton method has been proposed2 to solve the large-scale, non-linear inversion problem, see Fig. 1 for a high-level overview.
The evaluate the signal model, it was proposed in previous work3 to first compute the magnetization at echo times for all readouts in all voxels on GPU hardware using an Extended Phase Graph (``EPG'') simulator written in Julia. Secondly, for each timepoint $$$t$$$ the magnetization at sample time $$$t$$$ is computed per voxel and the discrete sum is evaluated using handwritten GPU kernels written in Julia.
In this work, the first step is accelerated with a highly optimized EPG simulator written in CUDA5. A key ingredient in this implementation is that the columns of the configuration state matrix $$$\Omega$$$ in each voxel is distributed over a warp6 (32 threads). Warp shuffle instructions are used to communicate across threads.
The second step is accelerated by recognizing that, for Cartesian trajectories, the k-space step $$$\Delta k$$$ in between samples is always the same. The MR signal can then be computed using a matrix-matrix multiplication, see Fig. 2b. Similar concepts can be applied to compute partial derivatives and matrix-vector products with the Jacobian matrix as required in the Gauss-Newton algorithm Fig. 1.
Both proposed acceleration strategies are part of the free and open-source Compas Toolkit for qMRI (https://github.com/NLeSC-COMPAS/compas-toolkit).Methods
The proposed EPG simulator is benchmarked against the previously proposed Julia simulator3. Simulations for a (transient-state) sequence with $$$1280$$$ readouts and $$$256^2$$$ voxels are performed for maximum configuration orders of $$$\Omega$$$ varying from 1 to 64.
The proposed matrix-matrix multiplication method for Cartesian signal computations is benchmarked against the handwritten kernel for a sequence of $$$5N$$$ readouts and image size of $$$N^2$$$, with $$$N \in [64,96,\ldots,512]$$$.
The above acceleration strategies have been incorporated into the MR-STAT reconstruction algorithm and are tested on 3D Cartesian in-vivo MR-STAT data acquired using a gradient-spoiled sequence with optimized7 flip angle trains. Brain data was acquired with isotropic 1 mm3 resolution, field-of-view of 224x224x134 mm3 and scantime 5m37s. Knee data was acquired with a 1x1x1.5 mm3 resolution, field-of-view of 184x184x150 mm3 and scantime 6m58s. See Liu et al4 for more acquisition details. SENSE reconstructions and 1D-FFT decoupling along the kz-direction were applied prior to MR-STAT reconstructions. Each resulting slice was reconstructed using 15 iterations of the accelerated Gauss-Newton MR-STAT algorithm.
All benchmark were performed on an NVIDIA RTX A5000 GPU.Results
The EPG and signal computation benchmark results are displayed in Fig. 3c-d. The proposed COMPAS Toolkit EPG simulator results in speedup factors between 1.7-14.6, with higher speedup factors for higher maximum configuration orders. The proposed Cartesian signal computation method is approximately 10 times faster than the handwritten kernel approach.
The MR-STAT reconstruction times for the 3D in-vivo brain and knee datasets are 12m23s and 25m22s (excluding the SENSE-based preprocessing), respectively, and are broken down into several components in Fig. 3. The average reconstruction times per slice were 7.43 (knee) and 11.4s (brain). Note that currently more than 70% of the reconstruction time is spent on Jacobian (transpose) matrix-vector products.
Sagittal, coronal and transverse slices of the in-vivo brain and the in-vivo knee reconstructions are displayed in Fig. 4 and 5, respectively.Discussion
We have greatly reduced MR-STAT reconstruction times to the point where offline reconstruction in clinical routines is feasible. We are an important step closer to our goal of one-minute, online 3D MR-STAT reconstructions to allow for direct feedback on the console and rely on existing pipelines for processing patient data. To address the current reconstruction bottlenecks, future research will focus on specialized multi-GPU implementations, mixed precision computations8 and surrogate models9.Acknowledgements
This work is supported by the Dutch Technology Foundation (grant #17986) and the Netherlands eScience Center (project NLESC.OEC.2022.001).References
1. Sbrizzi et al. Fast quantitative MRI as a nonlinear tomography problem. Magn Reson Imaging. 46.2 (2018): 56-632.
2 van der Heide, et al. High‐resolution in vivo MR‐STAT using a matrix‐free and parallelized reconstruction algorithm. NMR in Biomedicine 33.4 (2020): e4251.
3. van der Heide et al. Faster Bloch simulations and MR-STAT reconstructions on GPU using the Julia programming language, ISMRM 2021, p3063.
4. Liu et al. A three-dimensional Magnetic Resonance Spin Tomography in Time-domain protocol for high-resolution multiparametric quantitative magnetic resonance imaging. NMR in Biomedicine: e5050.
5. NVIDIA, Vingelmann & Fitzek 2020. CUDA, release 12. Available at: https://developer.nvidia.com/cuda-toolkit
6. Lin and Grover. Using CUDA Warp-Level Primitives. Available at: https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/
7. Fuderer et al. Efficient performance analysis and optimization of transient-state sequences for multiparametric magnetic resonance imaging. NMR in Biomedicine 36.3 (2023): e4864.
8. Kelley. Newton's Method in Mixed Precision. SIAM Review, 64.1 (2022): p191-211.
9. Liu et al. Fast and accurate modeling of transient-state, gradient-spoiled sequences by recurrent neural networks. NMR in Biomedicine. 34.7 (2021): e4527.