1879

Parallel MRI accelerator: The MPSoC design for real time image reconstruction
Abdul Basit1, Omair Inam1, and Hammad Omer1
1COMSATS UNIVERSITY ISLAMABAD, Islamabad, Pakistan

Synopsis

Keywords: Image Reconstruction, Cardiovascular

Motivation: Real-time MRI requires efficient data acquisition and low latency image reconstruction along high temporal resolution. The pMRI method known as GRAPPA, offers advantage in terms of fast data acquisition.

Goal(s): However, large computational requirements of GRAPPA limit its performance in real-time clinical settings.

Approach: This paper presents a novel MPSoC based hardware accelerator which combines 32-bit FPGA based accelerator module with multiple DSP engines and on-chip ARM processor, to provide sufficient computational resources for GRAPPA.

Results: The results using in-vivo cardiac datasets i.e. 18-receiver coils, show that the proposed accelerator reconstructs cardiac images at ∼35 frames-per-second without degrading the image quality.

Impact: The proposed accelerator is capable to reconstruct 35 frames in one second as compared to the CPU-based counterparts which can only reconstruct 2 frames/second for a given GRAPPA reconstruction setting in our experiments.

Synopsis

Real-time MRI requires efficient data acquisition and low latency image reconstruction along high temporal resolution. The pMRI method known as GRAPPA, offers advantage in terms of fast data acquisition. However, large computational requirements of GRAPPA limit its performance in real-time clinical settings. This paper presents a novel MPSoC based hardware accelerator which combines 32-bit FPGA based accelerator module with multiple DSP engines and on-chip ARM processor, to provide sufficient computational resources for GRAPPA. The results using in-vivo cardiac datasets i.e. 18-receiver coils, show that the proposed accelerator reconstructs cardiac images at ∼35 frames-per-second without degrading the image quality.

Introduction

GRAPPA is a pMRI algorithm that reconstructs the missing k-space data of each coil by linearly combining the acquired k-space data1. GRAPPA interpolates the missing k-space data in two separate stages: (i) calibration and (ii) synthesis2. During the calibration stage, GRAPPA weight sets (W) are calculated by using fully sampled auto-calibration signals (ACS) lines. In the synthesis stage, sequential convolution operations are performed between the under-sampled k-space data and GRAPPA weight sets to generate the fully sampled k-space data3. Later, the fully sampled k-space data is transformed into multi-coil images by applying the inverse Fourier transform. The multi-coil images are then combined using their sum-of-squares to generate a composite solution image (Figure-1).
The computational complexity of GRAPPA reconstruction process rises exponentially with an increase in the number of receiver coils, GRAPPA kernel size and acceleration factor. However, there exists an inherent parallelism in both the stages of GRAPPA reconstruction that can be exploited by the custom designed hardware4-6. In this paper, an MPSoC based GRAPPA accelerator is implemented on heterogeneous processing platform (Xilinx Versal VPK120) to enable fast image reconstruction. The proposed accelerator has been implemented on a device which is equipped with quad-core A53 processing system (PS), digital signal processing (DSP) engines and programmable logic (PL)7. The efficacy of the proposed accelerator is validated by conducting experiments on 18-coil in-vivo cardiac dataset.

Methodology

The block diagram of the proposed accelerator is shown in Figure-2. It consists of an FPGA based accelerator (CAL) equipped with parallel computational blocks (CBs) which are capable to estimate the GRAPPA weights sets without performing sequential kernel repetition. In this work, high level synthesis is adopted to implement the FGPA based hardware accelerator. Compiler directives i.e. HLS UNROLL and HLS PIPELINE are used for incorporating the design optimizations in CAL to exploit the inherent parallelism in large scale matrix-matrix multiplications and inversion. Moreover, an array of 1968 DSP engines are used to perform parallel interpolation of the missing k-space data points in multiple receiver coils. The arbitration between the FPGA based CAL and DSP engines, is carried out by quad-core ARM processor. The arbitrator module ensures the correct sequence of operations in GRAPPA calibration and synthesis phases. Moreover, arbitrator also controls the data transfer requests of the FPGA based hardware accelerator and DSP engines. In the proposed MPSoC based hardware accelerator, a dynamic data hub supported by hierarchical memory banks, is implemented for high-speed memory access. Moreover, the Logarithmic interconnect (LIX) is used to facilitate high speed data transfers. The Direct Memory Access Controller (DMAC) streamlines data transfer, ultimately enhancing system performance and reliability. The proposed accelerator starts its operation by transferring SRC and TRG matrices from memory module i.e., DDR3. When the SRC and TRG matrices are ready, arbitrator triggers FPGA based hardware accelerator for the estimation of GRAPPA weight sets (W). Once GRAPPA weight sets (W) are estimated, arbitrator triggers the array of DSP engines to interpolate the missing k-space in receiver coils array and stores fully-sampled k-space data in the memory module. Later, ARM processor transforms the fully sampled k-space data into multi-coil images by applying the inverse Fourier transform FFT. The multi-coil images are then combined using their sum-of-squares (SOS) to generate a composite solution image.

Results and Discussion

The reconstruction times of the MPSoC based GRAPPA accelerator and the CPU-based counterparts are presented in Table-3. Moreover, the visual quality of reconstructed images is compared with the fully sampled reference image as shown in Figure-3. The results presented show that the proposed accelerator is capable of reconstructing ~35 frames/second while maintaining visual quality of the reconstructed images.

Conclusion

In this paper, a novel MPSoC based GRAPPA accelerator has been implemented on Xilinx MPSoC platform i.e., VPK120 with an aim to accelerate GRAPPA reconstruction while maintaining the visual quality of reconstructed images. The proposed accelerator is capable to reconstruct 35 FPS as compared to the CPU-based counterparts which can only reconstruct 2 FPS for a given GRAPPA reconstruction setting.

Acknowledgements

No acknowledgement found.

References

1. Griswold, Mark A., et al. "Generalized autocalibrating partially parallel acquisitions (GRAPPA)." Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 47.6 (2002): 1202-1210. 2. Breuer, Felix A., et al. "General formulation for quantitative G‐factor calculation in GRAPPA reconstructions." Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 62.3 (2009): 739-746. 3. Inam, Omair, et al. "Iterative Schemes to Solve Low-Dimensional Calibration Equations in Parallel MR Image Reconstruction with GRAPPA." BioMed research international 2017 (2017). 4. Basit, Abdul, Omair Inam, and Hammad Omer. "Accelerating GRAPPA reconstruction using SoC design for real-time cardiac MRI." Computers in Biology and Medicine 160 (2023): 107008. 5. Inam, Omair, et al. "GPU accelerated Cartesian GRAPPA reconstruction using CUDA." Journal of Magnetic Resonance 337 (2022): 107175. 6. Inam, Omair, et al. "FPGA-based hardware accelerator for SENSE (a parallel MR image reconstruction method)." Computers in Biology and Medicine 117 (2020): 103598. 7. Cong, Jason, et al. "High-level synthesis for FPGAs: From prototyping to deployment." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30.4 (2011): 473-491.

Figures

Figure-1: GRAPPA reconstruction process; the calibration stage estimates GRAPPA Weight Sets (W) whereas the synthesis stage interpolates the missing k-space data using GRAPPA Weight Sets (W) and the under-sampled k-space data i.e., ACQ.

Figure-2: The proposed accelerator with four modules; (i) ARB, (ii) CAL, (iii) SYN and (iv) MEM

Figure 3. GRAPPA reconstruction results of 18-coil cardiac dataset at = 2; (a) Reference Image (b) Reconstruction results of CPU-based GRAPPA (c) Reconstruction results of the proposed accelerator

Table 1: Data acquisition details of 18-coil in-vivo cardiac dataset

Table 2: Hardware specifications VPK120 and CPU platforms employed for GRAPPA reconstruction

Table 3: GRAPPA reconstruction time for 18-coil cardiac dataset using with 32-ACS lines

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)
1879
DOI: https://doi.org/10.58530/2024/1879