GPU optimized fast 3D MRI simulator
Ryoichi Kose1 and Katsumi Kose2

1MRTechnology Inc., Tsukuba, Japan, 2University of Tsukuba, Tsukuba, Japan

Synopsis

We have developed a GPU optimized fast 3D MRI simulator for pulse sequence developments. We compared simulation and experimental results for gradient echo images of water phantoms that contains an air-filled cylinder and air-filled sphere. The agreements between simulation and experiments were good if the calculation matrix was more than two times that of original images. Because the processing speed of our simulator varied from 2.2 to 3.1 TFLOPS, we concluded that our simulator is useful for development of MRI pulse sequences.

Introduction

With the development of advanced MRI hardware units such as high field superconducting magnets, high performance gradient systems, and digital RF systems, MRI pulse sequences are becoming more and more sophisticated and complicated. In this situation, MRI simulators using high performance computing systems are becoming much more important. Up to now, many MRI simulators have been reported [1-4]. However, there are very few simulators closely collaborated with MRI experiments. In this study, we developed a GPU optimized fast 3D MRI simulator that can directly perform MRI pulse sequences used in the experiments.

Materials and Methods

A GPU (GeForce GTX TITAN-Z, NVIDIA) was used with an 8 core CPU (Core i7-5960X, Intel). The GPU had 5760 CUDA cores, 12 GB DDR5 memory, and about 8 TFLOPS (single precision) theoretical processing speed. The GPU programs were developed using CUDA. The core of the MRI simulator program was step by step integration of the Bloch equation of the spin isochromats in the voxels, but the time steps were optimized to reduce the processing time. The MRI simulator worked according to the actual MRI pulse sequence written by a simple text (Fig.1). Actually, our MRI simulator automatically generate the CUDA code for the Bloch simulation using the input datasets, that is, object parameters, system parameters, and the pulse sequence to optimize the processing speed of the GPU program (Fig.2). Experimental data were acquired with a home-built digital MRI system using a 1.5 T horizontal bore superconducting magnet (room temperature bore: 280 mm in diameter and 521.5 mm in length, JASTEC). Two cylindrical water (CuSO4 solution) phantoms which contained an air-filled cylinder (21 mm diameter) and an air-filled sphere (25.4 mm diameter) were imaged using 3D gradient echo sequences (TR = 200 ms, TE = 32 and 48 mm, FOV = (76.8 mm)3) to visualize magnetic field distribution around the air (Fig.3). MRI simulations for the phantoms were performed using the magnetic field distribution calculated theoretically [5].

Results

Figure 4 shows horizontal cross-sectional planes selected from 3D image datasets of the air-filled cylinder phantom calculated by the simulator ((a)-(c)) and acquired with the gradient echo sequence (TE = 48 ms) (f). The matrix size for all of the images was 256×256×16, but the matrix size used for the simulation, processing time, and speed were (a) 256×256×16, 5.9 s, and 2.14 TFLOPS, (b) 512×512×16, 17.3 s, and 2.94 TFLOPS, and (c) 1024×1024×16, 65.4 s, and 3.11 TFLOPS. As clearly shown in the figures, 256×256 in-plane calculation matrix is not sufficient but 512×512 is sufficient for description of the signal loss around the cylinder. Figure 5 shows horizontal cross-sectional planes selected from 3D image datasets of the air-filled sphere phantom calculated by the simulator ((a)-(c)) and acquired with the gradient echo sequence (f). The matrix size for all of the images was 256×256×64, but the matrix size used for the simulation, processing time, and speed were (a) 256× 256×64, 73 s, and 2.78 TFLOPS, (b) 512×512×64, 298.6 s, and 2.72 TFLOPS, and (c) 1024×1024×64, 1184.8 s, and 2.74 TFLOPS. As clearly shown in the figures, 256× 256 in-plane calculation matrix is not sufficient but 512×512 is sufficient for description of the signal loss around the sphere.

Discussion

Our simulator used experimental pulse sequences, numerical phantoms, and system parameters to generate CUDA code optimized for Bloch simulations. The processing speed for our results varied from 2.2 to 3.1 TFLOPS depending on the image matrix and other conditions. These processing speeds are reasonable because the theoretical one of our GPU is about 8 TFLOPS. In conclusion, our approach to the fast MRI simulator is useful for development of MRI pulse sequences.

Acknowledgements

We acknowledge Dr. Tomoyuki Haishi for supporting this work. This work was supported by Japan Science and Technology Agency.

References

[1] H. Benoit-Cattina, et al. JMR 173 (2005) 97–115. [2] Peter Latta, et al. JMR 203 (2010) 44–51. [3] Tony Stöcker et al. MRM 64 (2010) 186–193. [4] C.G. Xanthis, et al. IEEE TMI 33 (2014) 607-616. [5] K.M. Ludeke, et al. MRI 3 (1985) 329-343.

Figures

Fig.1. MRI pulse sequence file described by simple text. The left is that of 3D projection, the right is that of 2D multishot spiral. The numbers in the left columns show times written in 0.1 ms.

Fig.2. The block diagram of the GPU optimized MRI simulator. The simulator written in Visual C++ automatically generates the “CUDA source code” using the object parameters, the pulse sequence, and the system parameters, compiles it using the NVIDIA runtime compiler, and output an optimized Bloch simulator execution code.

Fig.3. Water phantoms used in the experiment. The left is the air-filled cylinder phantom and the right is the air-filled sphere phantom. The inner diameters and inner heights of the containers are 54 and 80 mm for the air-filled cylinder phantom, and 64 and 92 mm for the air-filled sphere phantom.

Fig.4. 2D cross-sectional planes selected from 3D image datasets of the air-filled cylinder phantom calculated by the simulator ((a), (b), and (c)) and measured by the experiment. The matrix sizes mean those of the phantom used for the calculation and the experiment. The size of the k-space is identical.

Fig.5. 2D cross-sectional planes selected from 3D image datasets of the air-filled sphere phantom calculated by the simulator ((a), (b), and (c)) and measured by the experiment(f). The matrix sizes mean those of the phantom used for the calculation and the experiment. The size of the k-space is identical.



Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)
3202