4436

FPGA processors of Parallelized 2D FFT suitable for real-time RARE image reconstruction
Limin Li1 and Alice M Wyrwicz1,2

1Center for Basic Magnetic Resonance Research, Northshore University HealthSystem, Evanston, IL, United States, 2Biomedical Engineering, Northwestern University, Evanston, IL, United States

Synopsis

We report here a new parallelized 2D FFT algorithm suitable for real-time RARE image reconstruction and describe how to implement the algorithm on a Field-Programmable Gate Array (FPGA). We will present the design and testing of these FPGA processors, and demonstrate their utility in reconstructing RARE images.

Introduction

Hardware processors capable of parallelized 2D FFT can be used to accelerate image processing in many fields.1-2 However, generic processors are not effective for processing MR data streams acquired under RARE imaging sequences because of the unique structure of the raw data. To parallelize 2D FFT computations for image reconstruction, typically a 2D-matrix dataset is uniformly partitioned and distributed to multiple processing elements (PEs) that generate subsets of an image simultaneously, but this approach works only for data with a regular structure. The raw data of a RARE image, however, is irregular in the sense that the phase modulation is not linear in one of the dimensions.3 For real-time processing on an FPGA, data-reordering prior to the 2D FFT computations leads to unacceptable processing latencies and inefficient on-chip resource utilization. The irregularity of the RARE data also complicates data partition and distribution during parallel processing. In this abstract, we describe a new algorithm for parallel implementation of a 2D FFT and present the design and testing of FPGA processors suitable for real-time RARE image reconstruction.

Design and Implementation

2D FFT computations on a N1XN2 data matrix can be carried out through row-column (RC) decomposition: N2 N1-point 1D FFTs on the rows of the data matrix followed by N1 N2-point 1D FFTs on the columns of the resulting data matrix from the first FFTs. Fig.1 shows the three main steps for parallel operations of the 2D FFT with four PEs, P0, P1, P2 and P3. Each PE is allocated m2 consecutive rows of data in which m2 = N2/P. An input data set is divided into blocks with N1 elements and distributed to P PEs each consisting of a 1D FFT processor. The data blocks are initially delivered to the P PEs in an interleaved manner (A). All PEs perform their own shares of row-wise FFT operations simultaneously (B). To complete the column-wise FFTs, the intermediate resulting data are read in order such that data matrix transposition is performed concurrently (C). We designed the architectures of 2D FFT processors for implementing the proposed algorithm in LabView 2014 and prototyped the processors on an FPGA chip XC7K410T (Xilinx Inc., San Jose, CA USA.) which sits on a NI USRP-2940R board (National Instruments, Inc., Austin, TX USA). We built a PE by connecting NI’s 1D FFT core with input and output FIFO (first-in-first-out) memory buffers. The data flows through all the PEs are controlled with proper addresses generated by Address Generation Units (AGUs). Specific AGUs were developed and used to re-regulate the intermediate resulting data without additional clock cycles and processing latencies. To facilitate the implementation, we set RARE factor (R) equal to the parallelism factor (P). We developed FPGA processors with P=4 and P=8.

Experimental Verification and Discussion

We tested and evaluated the 2D FFT processors by reconstructing multi-slice RARE images. A graphical user interface (GUI) program was developed using the LabView platform to initialize and interact with the FPGA board. The tests were performed on static data previously acquired and stored in the PC, typically structured with a matrix of 128X128X16. However the data transfer between the FPGA board and the host PC was managed in such a way that the data flowing into the FPGA appeared as real-time data streams that would be received during a MRI experiment, as previously described.4 The tests validated that the developed processors were capable of reconstructing multi-slice RARE images at rates up to ~3200 and ~4000 frames per second, respectively, with P=4 and P=8. Fig. 2 displays the RARE images selected from the reconstructed multi-slice images of a rabbit brain. For comparison, the raw dataset with RARE factor = 8 was also processed with conventional software running on a PC. The image (Fig. 2C) processed in the PC appears identical to the image (Fig. 2B) processed with the FPGA, which verifies the effectiveness of our design. A distinguishing feature of this design is that our developed AGUs were able to re-regulate the RARE data with great efficiency, and the method can be applied to designing 2D FFT processors capable of processing the raw data acquired under different imaging sequences by redesigning appropriate AGUs.

Conclusion

This work demonstrates that our algorithm and processor architecture design implemented on the FPGA are capable of parallelized 2D FFT suitable for RARE image reconstruction. The design method can be applied to 2D FFT hardware processors suitable for image reconstruction of different pulse sequences as well.

Acknowledgements

This work was supported by NIH grant R21EB024852.

References

1. Yu CL, Irick K, et al. Multidimensional DFT IP generator for FPGA platforms. IEEE Trans. Circuits Syst. 2011;(58): 755-854. 2. Kee H, Petersen N, et al. Systematic generation of FPGA-based FFT implementations. Proc. Intl Conf on Acoustics, Speech, and Signal Processing, Las Vegas, Nevada, March 2008;1413-1416. 3. Li L, Wyrwicz AM. Modularized architecture of address generation units suitable for real-time processing MR data on an FPGA. Rev Sci Instrum. 2016;87(6):063705. 4. Li L, Wyrwicz AM. Design of an MR image processing module on an FPGA chip. JMR. 2015;255:51-58.

Figures

Fig. 1: Data flows and allocation to four PEs during the operations of the parallelized 2D FFT algorithm. N1 = N2 and m1 = m2 = N2/P.

Fig. 2: Images with 128X128 plane resolution selected from a multi-slice RARE image set of a rabbit brain. Images A and B with RARE factor = 4 and 8 were processed with the FPGA processors; image C was processed with conventional software running on a PC.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)
4436