Ryoichi Kose1 and Katsumi Kose2
1MRTechnology Inc., Tsukuba, Japan, 2University of Tsukuba, Tsukuba, Japan
Synopsis
We have developed a GPU optimized fast 3D MRI simulator for pulse sequence developments. We compared simulation and experimental results for gradient echo images of water phantoms that contains an air-filled cylinder and air-filled sphere. The agreements between simulation and experiments were good if the calculation matrix was more than two times that of original images. Because the processing speed of our simulator varied from 2.2 to 3.1 TFLOPS, we concluded that our simulator is useful for development of MRI pulse sequences. Introduction
With the development of advanced MRI hardware units
such as high field superconducting magnets, high performance gradient systems,
and digital RF systems, MRI pulse sequences are becoming more and more
sophisticated and complicated. In this situation, MRI simulators using high performance
computing systems are becoming much more important. Up to now, many MRI
simulators have been reported [1-4]. However, there are very few simulators closely
collaborated with MRI experiments. In this study, we developed a GPU optimized
fast 3D MRI simulator that can directly perform MRI pulse sequences used in the
experiments.
Materials
and Methods
A
GPU (GeForce GTX TITAN-Z, NVIDIA) was used with an 8 core CPU (Core i7-5960X,
Intel). The GPU had 5760 CUDA cores, 12 GB DDR5 memory, and about 8 TFLOPS
(single precision) theoretical processing speed. The GPU programs were
developed using CUDA. The core of the MRI simulator program was step by step
integration of the Bloch equation of the spin isochromats in the voxels, but
the time steps were optimized to reduce the processing time. The MRI simulator
worked according to the actual MRI pulse sequence written by a simple text
(Fig.1). Actually, our MRI simulator automatically generate the CUDA code for
the Bloch simulation using the input datasets, that is, object parameters, system
parameters, and the pulse sequence to optimize the processing speed of the GPU
program (Fig.2).
Experimental
data were acquired with a home-built digital MRI system using a 1.5 T
horizontal bore superconducting magnet (room temperature bore: 280 mm in
diameter and 521.5 mm in length, JASTEC). Two cylindrical water (CuSO
4 solution) phantoms which
contained an air-filled cylinder (21 mm diameter) and an air-filled sphere
(25.4 mm diameter) were imaged using 3D gradient echo sequences (TR = 200 ms,
TE = 32 and 48 mm, FOV = (76.8 mm)
3) to visualize magnetic field
distribution around the air (Fig.3). MRI simulations for the phantoms were
performed using the magnetic field distribution calculated theoretically [5].
Results
Figure
4 shows horizontal cross-sectional planes selected from 3D image datasets of
the air-filled cylinder phantom calculated by the simulator ((a)-(c)) and
acquired with the gradient echo sequence (TE = 48 ms) (f). The matrix size for
all of the images was 256×256×16, but the matrix size used for
the simulation, processing time, and speed were (a) 256×256×16, 5.9 s, and 2.14 TFLOPS, (b) 512×512×16, 17.3 s, and 2.94 TFLOPS, and
(c) 1024×1024×16, 65.4 s, and 3.11 TFLOPS. As
clearly shown in the figures, 256×256 in-plane calculation
matrix is not sufficient but 512×512 is sufficient for
description of the signal loss around the cylinder.
Figure
5 shows horizontal cross-sectional planes selected from 3D image datasets of
the air-filled sphere phantom calculated by the simulator ((a)-(c)) and
acquired with the gradient echo sequence (f). The matrix size for all of the
images was 256×256×64, but the matrix size used for
the simulation, processing time, and speed were (a) 256× 256×64, 73 s, and 2.78 TFLOPS, (b) 512×512×64, 298.6 s, and 2.72 TFLOPS,
and (c) 1024×1024×64, 1184.8 s, and 2.74 TFLOPS.
As clearly shown in the figures, 256× 256 in-plane calculation
matrix is not sufficient but 512×512 is sufficient for
description of the signal loss around the sphere.
Discussion
Our
simulator used experimental pulse sequences, numerical phantoms, and system
parameters to generate CUDA code optimized for Bloch simulations. The processing
speed for our results varied from 2.2 to 3.1 TFLOPS depending on the image
matrix and other conditions. These processing speeds are reasonable because the
theoretical one of our GPU is about 8 TFLOPS. In conclusion, our approach to the
fast MRI simulator is useful for development of MRI pulse sequences.
Acknowledgements
We acknowledge Dr. Tomoyuki Haishi for supporting this work. This work was supported by Japan Science and Technology Agency.
References
[1]
H. Benoit-Cattina, et al. JMR 173 (2005) 97–115.
[2]
Peter Latta, et al. JMR 203 (2010) 44–51.
[3]
Tony Stöcker et al. MRM 64 (2010) 186–193.
[4]
C.G. Xanthis, et al. IEEE TMI 33 (2014) 607-616.
[5]
K.M. Ludeke, et al. MRI 3 (1985) 329-343.