Alex Cerjanic1,2, Joseph L Holtrop1,2, Giang Chau Ngo1, Brent Leback3, Galen Arnold4, Mark Van Moer4, Genevieve LaBelle2,5, Jeffrey A Fessler6, and Bradley P Sutton1,2
1Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States, 2Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, United States, 3PGI Compilers & tools; an NVIDIA brand, Portland, OR, United States, 4National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, United States, 5Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States, 6Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, United States
Synopsis
PowerGrid is an accelerated, open source, freely
available toolkit for iterative reconstruction supporting non-Cartesian
trajectories. Using high level compiler directives, GPU accelerated Fourier
transform operators were implemented in a high level syntax designed to
correlate with the popular Image Reconstruction Toolbox (IRT). A speed-up of up to
8.96x over the unaccelerated IRT reconstruction was obtained using an NVIDIA Tesla K40c
accelerator. Introduction
Iterative
methods for image reconstruction problems in MRI are essential for advanced techniques
including: compressed sensing, low rank reconstruction, SENSE reconstruction1,
non-Cartesian trajectories2,3, field correction for off-resonance
distortion2, and joint-reconstruction based parameter estimation4. Iterative
reconstructions can enable high resolution imaging with high SNR, such as submillimeter
isotropic diffusion weighted acquisitions on a 3T clinical scanner5, by
taking advantage of parallel imaging and physics-based field correction. However,
all of these advanced reconstruction techniques typically have high computational
demands and may not be available to clinicians and researchers.
Fostering robust research and exchange in MR
reconstruction techniques requires sharing open and extensible code for others
to refine, extend, and improve. Previous work with graphics processing units
(GPUs) as hardware accelerators for image reconstruction, such as the IMPATIENT
toolkit6-8, produced very high speedup factors over CPU reconstructions for
non-Cartesian field corrected SENSE reconstructions. However, its
implementation is unfriendly for MR physicists to extend and adapt,
particularly to accommodate additional problem formulations such as motion
correction models or alternative regularization schemes. In this work, we
utilize compiler directives (available through OpenACC 2.0) to target GPU
accelerators with intuitive, high level C/C++ syntax to build extensible
reconstruction software and tools. The result, PowerGrid, is an open source
toolkit that makes accelerators and iterative reconstruction as accessible to
MR physicists as the commonly-adapted Image Reconstruction Toolbox (IRT)9.
Implementation:
Building upon a set of
hardware-accelerated operators such as the discrete Fourier transform (DFT) and
the nonuniform fast Fourier transform (NUFFT) via gridding, objects
implementing reconstruction techniques were coded in C++ to take advantage of
the accelerated transforms. Figure 1 shows an example of how these objects
maintain similar syntax and structure to those based in MATLAB from the IRT9 including a demonstration of a complete gradient descent solver in 6 lines of
PowerGrid/C++ code with intuitive syntax using freely available libraries10. Table 2 shows a list of features implemented as part of PowerGrid.
The NUFFT and the DFT were written in C++ for CPUs and
annotated for use with GPU accelerators (PGI/NVIDIA OpenACC Toolkit 15.7) as
shown in Figure 2. The OpenACC compiler directives enable the use of CPU based
C code on accelerators such as the GPU with suggestions to the compiler about
parallelism and data movement.
Methods:
Test
data sets were acquired on a Siemens 3 T Trio scanner using spiral readout acquisitions
and either a 32-channel or 12-channel head coil. The 12-channel data was compressed to 4 virtual
coils using Siemens hardware coil compression. Reconstructions were performed
on a quad-core Intel Xeon E5620 Linux workstation, 12 GB RAM, with a Tesla K40
GPU and Ubuntu Linux 14.04.
Results:
Table 2 compares the IRT/MATLAB reconstructions on CPU only versus GPU enabled
reconstructions utilizing PowerGrid. For the smaller 2D dataset (192x192x1x32 coils,
R=2), the speed up factor was much more modest (1.69x) due to the dominance of
CPU-GPU transfers and CPU-based linear algebra vs. the small Fourier transforms
transforms on GPU.
For
larger datasets, such as 3D datasets, the NUFFT on GPU via gridding performed
better than the IRT/MATLAB and the DFT on GPU.
For the smaller 3D test dataset
(64x64x16x4 coils, R=1), PowerGrid had a speed up factor of 8.96x via GPU
acceleration versus the fastest MATLAB implementation in the IRT. The larger 3D
test dataset (120x120x16x32 coils, R=1), Figure 3, had a slower speedup of
(2.23x) over the IRT implementation due to the increased number of coils as the
SENSE operator is not currently GPU accelerated and additional coils results in
additional CPU-GPU transfers and overhead.
Finally,
the modular nature of PowerGrid was demonstrated with a nonlinear motion
induced phase correction algorithm11 derived from the SENSE operator. The
phase corrected example sped up by 4.0x over the IRT.
Conclusion:
PowerGrid
provides access to GPU acceleration for iterative reconstructions that is both
adaptable and extensible for the implementation of additional reconstruction algorithms.
By maintaining an object and operator based syntax, implementing novel
algorithms in C++ are simplified in terms of programming time and required
expertise. In our initial observations, reasonably well-experienced IRT users
can perform an initial port of their MATLAB-based code to C++ using PowerGrid
in less than a day. The combination of an open-source, freely available
(http://mrfil.github.io/PowerGrid) toolkit with field inhomogeneity correction
and accelerated non-uniform Fourier transforms makes the practical use of longer
readout, non-Cartesian trajectories more accessible and more computationally
tractable.
Acknowledgements
We gratefully acknowledge the support of NVIDIA
Corporation with the donation of the Tesla K40 GPU used for this research. We
acknowledge support from the Blue Waters project by the National Science
Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. Blue
Waters is a joint effort of the University of Illinois at Urbana-Champaign and
its National Center for Supercomputing Applications. We also acknowledge the assistance
of Fernanda Foertter, Oak Ridge Leadership Computing Facility, and the National
Center for Supercomputing ApplicationsReferences
[1] K. P. Pruessmann, M.
Weiger, M. B. Scheidegger, and P. Boesiger, “SENSE: Sensitivity encoding for
fast MRI,” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–962, Nov.
1999.
[2] B. P. Sutton, D. C. Noll, and J. A.
Fessler, “Fast, iterative image reconstruction for MRI in the presence of field
inhomogeneities,” IEEE Transactions on Medical Imaging, vol. 22, no. 2, pp.
178–188, Feb. 2003.
[3] J. A. Fessler and
B. P. Sutton, “Nonuniform fast fourier transforms using min-max interpolation,”
IEEE Transactions on Signal Processing, vol. 51, no. 2, pp. 560–574, Feb. 2003.
[4] Giang Chau Ngo and
B. P. Sutton, “R2* mapping for robust brain function detection in the presence
of magnetic field inhomogeneity,” presented at the 2014 36th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC),
2014, pp. 1537–1540.
[5] Holtrop JL, Van AT & Sutton BP. Pushing
the Resolution of 3D Spin Echo Diffusion Acquisition. Proceedings of the
20th Annual Meeting of ISMRM, Melbourne 2012:pg 1881.
[6] J. Gai, N. Obeid, J. L. Holtrop, X.-L.
Wu, F. Lam, M. Fu, J. P. Haldar, W.-M. W. Hwu, Z.-P. Liang, and B. P. Sutton,
“More IMPATIENT: A gridding-accelerated Toeplitz-based strategy for
non-Cartesian high-resolution 3D MRI on GPUs,” Journal of Parallel and
Distributed Computing, vol. 73, no. 5, pp. 686–697, May 2013.
[7] W. M. W. Hwu, D. Nandakumar, J. Haldar,
I. C. Atkinson, B. Sutton, Z.-P. Liang, and K. R. Thulborn, “Accelerating MR
image reconstruction on GPUs,” presented at the IEEE International Symposium on
Biomedical Imaging: From Nano to Macro, 2009. ISBI '09, 2009, pp. 1283–1286.
[8] S. S. Stone, J. P.
Haldar, S. C. Tsao, W. M. W. Hwu, B. P. Sutton, and Z. P. Liang, “Accelerating
advanced MRI reconstructions on GPUs,” Journal of Parallel and Distributed
Computing, vol. 68, no. 10, pp. 1307–1318, Oct. 2008.
[9] J. A. Fessler, “Image
Reconstruction Toolbox.” Available at: http://web.eecs.umich.edu/~fessler/code/
[10] C. Sanderson, “Armadillo: An Open Source C++
Linear Algebra Library for Fast Prototyping and Computationally Intensive
Experiments.,” NICTA, 2010. Available at: http://arma.sourceforge.net
[11] Liu C, Moseley ME & Bammer R.
Simultaneous phase correction and SENSE reconstruction for navigated multi-shot
DWI with non-cartesian k-space sampling. Magnetic resonance in medicine
2005:54:1412-1422.