Limin Li1 and Alice M Wyrwicz1,2
1Center for Basic Magnetic Resonance Research, Northshore University HealthSystem, Evanston, IL, United States, 2Biomedical Engineering, Northwestern University, Evanston, IL, United States
Synopsis
We report here a new parallelized 2D FFT
algorithm suitable for real-time RARE
image reconstruction and describe how to implement the algorithm on a
Field-Programmable Gate Array (FPGA). We will present the design and testing of
these FPGA processors, and demonstrate their utility in reconstructing RARE images.
Introduction
Hardware processors capable of parallelized 2D
FFT can be used to accelerate image processing in many fields.1-2
However, generic processors are not effective for processing MR data streams
acquired under RARE imaging
sequences because of the unique structure of the raw data. To parallelize 2D
FFT computations for image reconstruction, typically a 2D-matrix dataset is
uniformly partitioned and distributed to multiple processing elements (PEs) that
generate subsets of an image simultaneously, but this approach works only for
data with a regular structure. The raw data of a RARE
image, however, is irregular in the sense that the phase modulation is not
linear in one of the dimensions.3 For real-time processing on an
FPGA, data-reordering prior to the 2D FFT computations leads to unacceptable
processing latencies and inefficient on-chip resource utilization. The
irregularity of the RARE data also
complicates data partition and distribution during parallel processing. In this
abstract, we describe a new algorithm for parallel implementation of a 2D FFT
and present the design and testing of FPGA processors suitable for real-time RARE image reconstruction.Design and Implementation
2D FFT computations on a N1XN2 data matrix can be carried
out through row-column (RC) decomposition: N2
N1-point 1D FFTs on the
rows of the data matrix followed by N1 N2-point 1D FFTs on the
columns of the resulting data matrix from the first FFTs. Fig.1 shows the three
main steps for parallel operations of the 2D FFT with four PEs, P0, P1, P2
and P3. Each PE is
allocated m2 consecutive
rows of data in which m2 =
N2/P. An
input data set is divided into blocks with N1
elements and distributed to P PEs
each consisting of a 1D FFT processor. The data blocks are initially delivered
to the P PEs in an interleaved manner
(A). All PEs perform their own shares of row-wise FFT operations simultaneously
(B). To complete the column-wise FFTs, the intermediate resulting data are read
in order such that data matrix transposition is performed concurrently (C). We
designed the architectures of 2D FFT processors for implementing the proposed algorithm
in LabView 2014 and prototyped the processors on an FPGA chip XC7K410T (Xilinx
Inc., San Jose, CA USA.) which sits on a NI USRP-2940R board (National
Instruments, Inc., Austin, TX USA). We built a PE by connecting NI’s 1D FFT core with input and output FIFO
(first-in-first-out) memory buffers. The data flows through all the PEs are
controlled with proper addresses generated by Address Generation Units (AGUs). Specific
AGUs were developed and used to re-regulate the intermediate resulting data
without additional clock cycles and processing latencies. To facilitate the
implementation, we set RARE factor
(R) equal to the parallelism factor (P). We developed FPGA processors with P=4 and P=8.Experimental Verification and Discussion
We tested and evaluated the 2D FFT processors by
reconstructing multi-slice RARE images.
A graphical user interface (GUI) program was developed using the LabView
platform to initialize and interact with the FPGA board. The tests were performed on static data
previously acquired and stored in the PC, typically structured with a matrix of
128X128X16. However the data transfer between the FPGA board and the
host PC was managed in such a way that the data flowing into the FPGA appeared
as real-time data streams that would be received during a MRI experiment, as previously described.4
The tests validated that the developed processors were capable of reconstructing
multi-slice RARE images at rates
up to ~3200 and ~4000 frames per second, respectively, with P=4 and P=8. Fig. 2 displays the RARE
images selected from the reconstructed multi-slice images of a rabbit brain. For
comparison, the raw dataset with RARE
factor = 8 was also processed with conventional software running on a PC. The
image (Fig. 2C) processed in the PC appears identical to the image (Fig. 2B)
processed with the FPGA, which verifies the effectiveness of our design. A
distinguishing feature of this design is that our developed AGUs were able to
re-regulate the RARE data with great
efficiency, and the method can be applied to designing 2D FFT processors
capable of processing the raw data acquired under different imaging sequences by
redesigning appropriate AGUs.Conclusion
This work demonstrates that our algorithm and
processor architecture design implemented on the FPGA are capable of
parallelized 2D FFT suitable for RARE
image reconstruction. The design method can be applied to 2D FFT hardware
processors suitable for image reconstruction of different pulse sequences as
well.Acknowledgements
This work was supported by NIH grant R21EB024852.References
1. Yu CL, Irick K, et al. Multidimensional DFT
IP generator for FPGA platforms. IEEE Trans. Circuits Syst. 2011;(58): 755-854.
2. Kee H, Petersen N, et al. Systematic generation of FPGA-based FFT
implementations. Proc. Intl Conf on
Acoustics, Speech, and Signal Processing, Las
Vegas, Nevada, March
2008;1413-1416. 3. Li L, Wyrwicz AM. Modularized architecture of address
generation units suitable for real-time processing MR data on an FPGA. Rev Sci
Instrum. 2016;87(6):063705. 4. Li L, Wyrwicz AM. Design of an MR image
processing module on an FPGA chip.
JMR. 2015;255:51-58.