2520

A Scalable High-Performance Magnetic Resonance Fingerprinting Search Engine using GPUs

Gabriel Zihlmann¹, Najat Salameh¹, and Mathieu Sarracanie¹
¹Department of Biomedical Engineering, University of Basel, Allschwil, Switzerland

Synopsis

Dictionary search for the reconstruction of MR Fingerprinting-based sequences can become a bottleneck due to the exponential growth of dictionary size with the number of parameters. Several approaches have been proposed to accelerate this search, yet the availability of efficient and scalable software implementations remains limited. Here, we extended an existing, highly optimized similarity search library to be compatible with the requirements of MR Fingerprinting dictionary search. The evaluation of our GPU implementation shows an acceleration by two orders of magnitude over the brute-force search yielding identical results for 99.9 % of the voxels.

Introduction

Magnetic Resonance Fingerprinting (MRF) allows to quantify multiple tissue parameters from a single heavily undersampled acquisition¹. As part of the voxel-wise MRF data reconstruction, a dictionary consisting of simulated signals is searched for the entry giving the highest inner product magnitude with the voxel signal. The original MRF implementation considered a three-parameter-model (T₁, T₂ and off-resonance), which can be extended with additional parameters for better accuracy (e.g., including B₁ imperfection²), or further enriched with encoded parameters^3–5. Inevitably, the number of entries in the dictionary and the search time grow exponentially with the number of parameters. Global dictionary compression in the time dimension by singular value decomposition (SVD)⁶, or grouping similar dictionary entries and computing the SVD per group⁷ before searching only the most promising groups, have been proposed to accelerate dictionary search. While these approaches showed promising results, efficient implementations harvesting the full potential of current graphics processing units (GPUs) are rare. In this work, we extended a highly optimized and freely available library for large scale similarity search, faiss⁸, written in C++ and CUDA, and applied it to MRF-based OPTIMUM⁹ for low-field multiparametric MRI.

Methods

Complex-vector capability was added to faiss’ Flat and IVFFlat index types. The Flat index corresponds to an exhaustive brute-force dictionary search based on matrix multiplication, whereas the IVFFlat performs

$k$ -means-based clustering on the dictionary atoms to form groups of similar entries of which only a subset is considered during a search. The IVFFlat-index search strategy is thus similar to the one described by Cauley et al.⁷, yet without employing SVD within groups.
The grouped search is controlled by two parameters: the number of groups

$k$ to form at training time, and the number of lists

$p$ to consider during search. The parameter

$k$ was chosen based on the number of entries in the dictionary n, as

$k = n / f$ with

$f$ ranging from 320 to 2800.
Optimal operating points were identified based on a thresholded recall metric R_T that was computed as the fraction of voxels for which the approximate search returned the same dictionary entry as the brute-force search, constrained to those voxels for which the brute-force search yielded an inner product magnitude of at least 0.85. This thresholding was performed to ignore background voxels that only contain noise. An operating point is considered optimal if there exists no faster one reaching an equal or higher value for R_T. Our evaluation dataset originates from a 3D bSSFP-based optimized MRF sequence (OPTIMUM) acquired at 100 mT in a calibrated 6-tube phantom with varying relaxation properties. Imaging parameters were: voxel resolution [1.5x1.5x6.5] mm³, matrix size 90x81x15, for which 18 time points corresponding to different pairs of flip angles and repetition times were acquired. Two dictionaries with different parameter resolutions were used, with roughly 6.7 million and 31 million entries, respectively. All results were obtained on a workstation equipped with an Intel i9-10920X 12-core CPU, 128 GiB of RAM, and one Nvidia RTX 3090 GPU.

Results

Figure 1 summarizes R_T for different operating points. A value of R_T > 0.999 was obtained at around 0.1 % (0.25 s vs. 24 s) of the brute-force time for the smaller dictionary and 0.05 % (0.49 s vs. 110 s) for the larger one. A steep decline of R_T for faster searches was observed. Training time ranged from 3 s to 59 s (small dictionary), and 33 s to 16 min (large dictionary), respectively, where smaller values of

$f$ took more time. Evaluation of the search engine without GPU showed similar trends, yet roughly 25 times slower.
Figure 2 shows the central slice of reconstructed MRF data together with the parameter error incurred by the accelerated search.

Discussion

The proposed search engine provides reconstructed maps virtually identical to the brute-force search, with up to two orders of magnitude reduction of the reconstruction time. Degradation of the resulting parameter maps becomes visible when using more aggressive acceleration (notably for Acc > 150x). Interestingly, it was possible to accelerate the search in the larger of the two dictionaries by about a factor of two more than for the smaller dictionary before degradation occurred. The performance for even larger dictionaries will be investigated in future experiments. Figure 1 indicates that there is an ideal value of f beyond which increasing the number of groups is no longer beneficial. This can be expected from the inherently increasing computational work when comparing group mean signals. The matching engine was evaluated on comparably short MR fingerprints as produced by our MRF implementation (OPTIMUM), and potential benefits for longer fingerprints will be subject to future research.

Conclusion

By building on a feature rich, efficiently programmed basis, our implementation can take full advantage of the computational capabilities offered by modern GPUs. The multi-GPU capability available in faiss paves the way towards the efficient use of dictionaries that do not fit in a single GPU’s memory, making the quantification of additional parameters in MRF more accessible.

Acknowledgements

This work was supported by the Swiss National Science Foundation, grant no. PCEFP2_186861.

References

1. Ma, D. et al. Magnetic resonance fingerprinting. Nature 495, 187–192 (2013).

2. Cloos, M. A. et al. Multiparametric imaging with heterogeneous radiofrequency fields. Nat. Commun. 7, 12445 (2016).

3. Wang, C. Y. et al. Magnetic resonance fingerprinting with quadratic RF phase for measurement of T2* simultaneously with δf, T1, and T2. Magn. Reson. Med. 81, 1849–1862 (2019).

4. Cohen, O., Huang, S., McMahon, M. T., Rosen, M. S. & Farrar, C. T. Rapid and quantitative chemical exchange saturation transfer (CEST) imaging with magnetic resonance fingerprinting (MRF). Magn. Reson. Med. 80, 2449–2463 (2018).

5. Su, P. et al. Multiparametric estimation of brain hemodynamics with MR fingerprinting ASL. Magn. Reson. Med. 78, 1812–1823 (2017).

6. McGivney, D. F. et al. SVD Compression for Magnetic Resonance Fingerprinting in the Time Domain. IEEE Trans. Med. Imaging 33, 2311–2322 (2014).

7. Cauley, S. F. et al. Fast group matching for MR fingerprinting reconstruction. Magn. Reson. Med. 74, 523–528 (2015).

8. Johnson, J., Douze, M. & Jégou, H. Billion-Scale Similarity Search with GPUs. IEEE Trans. Big Data 7, 535–547 (2021).

9. Sarracanie, M. Fast Quantitative Low-Field Magnetic Resonance Imaging With OPTIMUM—Optimized Magnetic Resonance Fingerprinting Using a Stationary Steady-State Cartesian Approach and Accelerated Acquisition Schedules. Invest. Radiol. (2021)

Figures

Survey of different operating points for the search in two dictionaries of different sizes using the GPU. Each curve corresponds to a different total number of groups k at training time. Individual points in each curve were obtained by changing the number of groups to consider at search time.

Accelerated search performance compared to ground truth brute-force. (a) Brute-force reconstruction from an OPTIMUM acquisition at 100 mT in a calibrated phantom (central slice displayed), and (b) Relative error corresponding to different optimal operating points for the accelerated search in the 6.7M-entry dictionary, normalized by the brute-force value. Only voxels with an inner product magnitude larger than 0.85 are shown.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)

2520

DOI: https://doi.org/10.58530/2022/2520