Abdul Basit1, Omair Inam1, and Hammad Omer1
1COMSATS UNIVERSITY ISLAMABAD, Islamabad, Pakistan
Synopsis
Keywords: Image Reconstruction, Cardiovascular
Motivation: Real-time MRI requires efficient data acquisition and low latency image reconstruction along high temporal resolution. The pMRI method known as GRAPPA, offers advantage in terms of fast data acquisition.
Goal(s): However, large computational requirements of GRAPPA limit its performance in real-time clinical settings.
Approach: This paper presents a novel MPSoC based hardware accelerator which combines 32-bit FPGA based accelerator module with multiple DSP engines and on-chip ARM processor, to provide sufficient computational resources for GRAPPA.
Results: The results using in-vivo cardiac datasets i.e. 18-receiver coils, show that the proposed accelerator reconstructs cardiac images at ∼35 frames-per-second without degrading the image quality.
Impact: The proposed accelerator is capable to reconstruct 35 frames in one
second as compared to the CPU-based counterparts which can only reconstruct 2
frames/second for a given GRAPPA reconstruction setting in our experiments.
Synopsis
Real-time MRI requires efficient data acquisition and low latency image
reconstruction along high temporal resolution. The pMRI method known as GRAPPA,
offers advantage in terms of fast data acquisition. However, large
computational requirements of GRAPPA limit its performance in real-time clinical settings. This paper presents a novel MPSoC based
hardware accelerator which combines 32-bit FPGA based accelerator module with
multiple DSP engines and on-chip ARM processor, to provide sufficient
computational resources for GRAPPA. The results using in-vivo cardiac datasets
i.e. 18-receiver coils, show that the proposed accelerator reconstructs cardiac
images at ∼35
frames-per-second without degrading the image quality.Introduction
GRAPPA is a pMRI
algorithm that reconstructs the missing k-space data of each coil
by linearly combining the acquired k-space data1. GRAPPA
interpolates the missing k-space data in two separate stages: (i) calibration
and (ii) synthesis2. During the calibration stage, GRAPPA weight
sets (W) are calculated by using fully sampled auto-calibration signals (ACS)
lines. In the synthesis stage, sequential convolution operations are performed between
the under-sampled k-space data and GRAPPA weight sets to generate the fully
sampled k-space data3. Later, the fully
sampled k-space data is transformed into multi-coil images by applying the
inverse Fourier transform. The multi-coil images are then combined using their
sum-of-squares to generate a composite solution image (Figure-1).
The computational complexity of GRAPPA
reconstruction process rises exponentially with an increase in the number of
receiver coils, GRAPPA kernel size and acceleration factor. However, there
exists an inherent parallelism in both the stages of GRAPPA reconstruction that
can be exploited by the custom designed hardware4-6. In this paper, an
MPSoC based GRAPPA accelerator is implemented on
heterogeneous processing platform (Xilinx Versal VPK120) to enable fast image
reconstruction. The proposed accelerator has been implemented on a device which
is equipped with quad-core A53 processing system (PS), digital signal
processing (DSP) engines and programmable logic (PL)7. The efficacy of the proposed accelerator is
validated by conducting experiments on 18-coil in-vivo cardiac dataset.Methodology
The block diagram of the proposed accelerator
is shown in Figure-2. It consists of an FPGA based accelerator (CAL)
equipped with parallel computational blocks (CBs) which are capable to estimate
the GRAPPA weights sets without performing sequential kernel repetition. In
this work, high level synthesis is adopted to implement the FGPA based hardware
accelerator. Compiler directives i.e. HLS UNROLL and HLS PIPELINE are used for
incorporating the design optimizations in CAL to exploit the inherent
parallelism in large scale matrix-matrix multiplications and inversion.
Moreover, an array of 1968 DSP engines are used to perform parallel
interpolation of the missing k-space data points in multiple receiver coils.
The arbitration between the FPGA based CAL and DSP engines, is
carried out by quad-core ARM processor. The arbitrator module ensures the
correct sequence of operations in GRAPPA calibration and synthesis phases.
Moreover, arbitrator also controls the data transfer requests of the FPGA based
hardware accelerator and DSP engines. In the proposed MPSoC based hardware
accelerator, a dynamic data hub supported by hierarchical memory banks, is
implemented for high-speed memory access. Moreover, the Logarithmic
interconnect (LIX) is used to facilitate high speed data transfers. The Direct
Memory Access Controller (DMAC) streamlines data transfer, ultimately enhancing
system performance and reliability. The proposed accelerator
starts its operation by transferring
SRC and TRG matrices from memory module i.e., DDR3. When the SRC and TRG
matrices are ready, arbitrator triggers FPGA based hardware accelerator for the
estimation of GRAPPA weight sets (W). Once GRAPPA weight sets (W) are
estimated, arbitrator triggers the array of DSP engines to interpolate the
missing k-space in receiver coils array and stores fully-sampled k-space data in
the memory module. Later, ARM processor transforms the fully sampled k-space
data into multi-coil images by applying the inverse Fourier transform FFT. The
multi-coil images are then combined using their sum-of-squares (SOS) to
generate a composite solution image.Results and Discussion
The reconstruction
times of the MPSoC based GRAPPA accelerator and the CPU-based
counterparts are presented in Table-3. Moreover, the visual quality of
reconstructed images is compared with the fully sampled reference image as
shown in Figure-3. The results presented show that the proposed
accelerator is capable of reconstructing ~35 frames/second while maintaining
visual quality of the reconstructed images. Conclusion
In this paper, a novel MPSoC
based GRAPPA accelerator has been implemented on Xilinx MPSoC platform i.e.,
VPK120 with an aim to accelerate
GRAPPA reconstruction while maintaining the visual quality of reconstructed
images. The proposed accelerator is capable to reconstruct 35 FPS as compared to the CPU-based counterparts which can only reconstruct 2 FPS for a given GRAPPA reconstruction setting.Acknowledgements
No acknowledgement found.References
1.
Griswold, Mark A., et al. "Generalized
autocalibrating partially parallel acquisitions (GRAPPA)." Magnetic
Resonance in Medicine: An Official Journal of the International Society for
Magnetic Resonance in Medicine 47.6 (2002): 1202-1210.
2.
Breuer, Felix A., et al. "General
formulation for quantitative G‐factor
calculation in GRAPPA reconstructions." Magnetic Resonance in Medicine: An
Official Journal of the International Society for Magnetic Resonance in
Medicine 62.3 (2009): 739-746.
3.
Inam, Omair, et al. "Iterative Schemes to
Solve Low-Dimensional Calibration Equations in Parallel MR Image Reconstruction
with GRAPPA." BioMed research international 2017 (2017).
4.
Basit, Abdul, Omair Inam, and Hammad Omer.
"Accelerating GRAPPA reconstruction using SoC design for real-time cardiac
MRI." Computers in Biology and Medicine 160 (2023): 107008.
5.
Inam, Omair, et al. "GPU accelerated
Cartesian GRAPPA reconstruction using CUDA." Journal of Magnetic Resonance
337 (2022): 107175.
6.
Inam, Omair, et al. "FPGA-based hardware
accelerator for SENSE (a parallel MR image reconstruction method)."
Computers in Biology and Medicine 117 (2020): 103598.
7.
Cong, Jason, et al. "High-level synthesis
for FPGAs: From prototyping to deployment." IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems 30.4 (2011):
473-491.