2853

Making Quantitative Susceptibility Mapping (QSM) a clinical reality: a one minute Morphology Enabled Dipole Inversion using GPU computing
Mengyuan Wan1, Zhe Liu2,3, Pascal Spincemaille2, and Yi Wang2,3

1Software Engineering, Wuhan University, Wuhan, China, 2Radiology, Weill Cornell Medical College, New York, NY, United States, 3Biomedical Engineering, Cornell University, Ithaca, NY, United States

Synopsis

In this work, we demonstrate the feasibility of using GPU computing to achieve a 15 fold acceleration of the most time consuming parts of the Morphology Enabled Dipole Inversion (MEDI) method for Quantitative Susceptiblity Mapping (QSM) leading to an overall 5 fold reduction in total processing time, allowing a one minute susceptibility map reconstruction.

Introduction

Reconstruction of quantitative susceptibility maps (QSM) from multi-echo gradient echo (GRE) data requires the use of non-linear Bayesian formulations, such as Morphology Enabled Dipole Inversion (MEDI) [1], to properly take into account the noise weighting of the complex data as well as to allow the necessary L1 regularization to overcome the ill-conditioned dipole field inversion at the heart of QSM. Minimizing this energy functions requires time-consuming iterative algorithm such as Gauss-Newton. While speed-ups may be possible to making certain approximations, such as assuming uniform phase noise, relaxing to L2 regularization or limiting the number of iterations, these usually come at the cost of reduced image quality and reduced accuracy [2,3]. Optimized implementations of the MEDI algorithm in C++ allows reconstructing QSM times around 5 minutes. In this work, we analyzed the major bottlenecks in this implementation and introduced GPU computations to allow speeding these up. An overall 5 fold reduction in total processing time was achieved, allowing a one minute high quality susceptibility map reconstruction.

Methods

Runing processes of a CPU implementation of using C++ and MKL (Intel Mathematical Kernel Library) [4] were profiled using the GNU profiling tool ‘gprof’. The most time-consuming parts were then reimplemented using GPU computing (based on CUDA), while the remaining sections of the code remained on the CPU. Development and experiments were conducted on a system running Core i7-5930K 3.5 GHz CPU with 64 GB RAM and GeForce GTX 1080 Ti GPU with 11GB memory. The CUBLAS (CUDA Basic Linear Algebra Subroutine) and the high-performance cuFFT (CUDA Fast Fourier Transform) libraries contained in the CUDA Toolkit [5] were used. Both CPU and GPU version of MEDI implementation were tested on a dataset: matrix size = $$$512\times512\times52$$$, voxel size = $$$0.47\times0.47\times3$$$ mm, echo space 4.8 ms, nTE = 11. The overall running time as well as the detailed time cost for FFT, gradient and divergence operation were recorded.

Results

More than half of the running time was consumed in basic operations in conjugate gradient solver, including Fourier transform, discrete gradient and divergence operation. These operations were parallelized on the GPU. To avoid the time cost associated with data transfer between CPU and GPU, the conjugate gradient ran completely on the GPU, with copies to the GPU and back limited to before and after, respectively. cuFFT was used for the Fourier transforms while cublasSaxpy was used to implement the forward difference for the discrete gradient $$$\triangle$$$ and the backward difference for the discrete divergence $$$\triangledown$$$. Since only complex data type was supported by functions in the cuFFT library, specific kernel functions were written for data type conversion between real and complex numbers. The running time of MEDI was 4 minutes 10 seconds for the CPU implementation compared to 47 seconds for the GPU implementation. A breakdown of the running time for basic operations (FFT, gradient and divergence) is shown in Table 1, with a remarkable 14 to 23-fold acceleration for the functions implemented on the GPU. Comparison of QSMs reconstructed by CPU and GPU implementation (Figure 1) showed a negligible difference with a root-mean-square-error of 3.2%.

Discussion

The basic operations in the conjugate gradient used in QSM were dramatically accelerated using GPU with a factor of 14 to 23, compared with the CPU implementation. The overall execution time of QSM was also cut down by 80 percent. In addition, by means of forward gradient and backward divergence, full parallel computation was enabled on GPU, at the cost of negligible difference in reconstructed QSM. In the future, the entire MEDI program could be implemented in GPU environment, eliminating the data transfer overhead between CPU and GPU and further reducing the reconstruction time. This works shows the feasibility of including an online high accuracy iterative QSM reconstruction on a clinical scanner.

Conclusion

A GPU implementation of MEDI for QSM is shown to allow a one minute susceptibility map reconstruction easing the way for online scanner implementation.

Acknowledgements

We acknowledge the support from NIH grants R01 NS072370, R01 NS090464, R01NS095562, and R01CA181566

References

[1]. Liu T, et al. Nonlinear formulation of the magnetic field to source relationship for robust quantitative susceptibility mapping. Magn Reson Med 2013; 69:467–476.

[2]. Wang S, et al. Noise effects in various quantitative susceptibility mapping methods. IEEE Trans Biomed 2013; 60(12): 3441-3448.

[3]. Wang S, et al. Structure prior effects in Bayesian approaches of quantitative susceptibility mapping. BioMed research international 2016.

[4]. Intel Corporation, Intel® Math Kernel Library Developer Reference - C. Intel, 2017.

[5]. NVIDIA Corporation, CUDA Toolkit Documentation v8.0. NVIDIA, 2017.

Figures

Figure 1. Reconstructed QSM of using CPU (left), GPU (middle) implementation and the difference map (right).

Table 1. Time cost for basic operations in conjugate gradient. Left: FFT, discrete gradient and divergence cost in CPU implementation; Right: time cost for same operations in GPU implementation. Speed-up of a factor of 15 was observed using GPU compared with its CPU counterpart.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)
2853