Resource-efficient architecture of FPGA-based 2D FFT processors

Limin Li¹ and Alice M Wyrwicz^1,2

¹Center for Basic MR Research, Northshore University Healthsystem, Evanston, IL, United States, ²Department of Biomedical Engineering, Northwestern University, Evanston, IL, United States

Synopsis

The processing rate for real-time multi-slice image reconstruction on an FPGA can be improved significantly by taking advantage of its parallel processing capability. In particular, multiple 2D FFT processors can be embedded into a single FPGA and run simultaneously. In this abstract, we report a new design of a 2D FFT processor with significant reduced usage of hardware resource. Test results show that an important type of resource, DSP48 slice, can be reduced by up to 50% without degrading processing performance, which implies that more 2D FFT cores can be installed into a single FPGA with a given size.

Introduction

With increasing capacities and decreasing cost, Field-Programmable Gate Arrays (FPGAs) are gradually utilized in MR image processing^1-4. By taking advantage of the FPGA's parallel processing capability, dramatic increases in processing rate can be achieved. In particular, in the case of multi-slice image reconstruction, multiple 2D FFT processors can be embedded into an FPGA and run simultaneously⁴. The higher processing rate is achieved at the expense of hardware resource on the FPGA. Therefore it is important that the architecture of FFT processors is designed such that the resources are utilized efficiently. In this abstract, we describe the design, implementation and testing of a 2D FFT module in which the utilization efficiency of hardware resources is improved significantly while maintaining the same processing performance.

Design and Implementation

The design was motivated by the fact that MR time-domain baseband signal samples are delivered to FPGA-based FFT processors at rates not exceeding 400 kHz, equivalent to one sample per 2.5 microseconds. An FPGA device is capable of reading in input samples at rates of one sample per 25 ns or faster. When such an FPGA processor is used to process MR data in real time, portions of the processor may be in an idle state in many clock cycles. One strategy for reducing hardware usage is to reuse portions of the processor within a single process. Therefore in our design, as compared to the previous one³, only one 1D FFT core is employed for both first and second FFT computations. Fig. 1 shows the functional block diagram for the architecture of the 2D FFT core module. The 1D FFT core was built on National Instruments’ 1D FFT subVI. It consists of two inputs and two outputs. When executing the first 1D FFT computations, a 2D slice of data with NxN matrix streams into an FPGA via a FIFO buffer. The data flows through the 1D FFT core along a path labeled "1" and are routed to an on-chip memory SRAM for temporary storage. The storing order of the data elements is controlled with proper addresses generated by an Address Generation Unit (AGU1). The first FFT computations are completed after the whole 2D slice are stored in the SRAM. The second FFT starts with reading in the input data from the SRAM. During the execution of the second FFT, data flows along a path labeled "2". The image reconstruction of the 2D slice is completed after a second pass of the 1D FFT core. The timing for every processing step must be accurately controlled such that data race does not occur between the intermediate and new data. The design was developed in LabView 2014 and the 2D FFT core was built on FPGA chips XC6SLX45 and XC7K410T (Xilinx Inc., San Jose, CA USA.) which sit respectively on NI sbRIO-963 and USRP-2940R boards (National Instruments, Inc., Austin, TX USA).

Results and Discussion

We tested and evaluated the 2D FFT core module by reconstructing multi-slice images. A graphical user interface (GUI) program was developed using the LabView platform to initialize and start the FPGA board and manage data transfer between the FPGA boards and a host PC, as previously described^3,4. In this work, the tests were performed on static data (previously acquired and stored in the PC), typically structured with a matrix of 128x128x16. We validated that the new 2D FFT core was capable of executing multi-slice image reconstruction at rates up to ~1000 frames per second. These processing rates were about the same as those for the 2D FFT processors with the two-1D-FFT-core architecture³. We also evaluated the usage of the hardware resources and compared the results with those of the previous design. Four types of predefined resources on the FPGAs show variable degrees of usage reduction (Table 1). Among them, the number of required DSP48 slices is reduced by ~30% on XC7K410T and ~50% on XC6SLX45 respectively. DSP48 denotes a logic slice specially designed to boost the performance of digital signal processing. The DSP48 slices are limited on a single FPGA. Therefore the reduced usage of DSP48 slices enables to employ more 2D FFT cores on the FPGA and achieve much higher processing rates with processing modules built on parallel processing architecture⁴.

Conclusion

This work demonstrates that by reusing portions of logic on the FPGA, utilization of hardware resource is more efficient. This design strategy can be used to dramatically improve the processing capabilities of a single FPGA.

Acknowledgements

This work was supported by NIH grants R01 NS44617 and 1S10RR15685.

References

1. Dalal IL, F.L. Fontaine, A Reconfigurable FPGA-based 16-Channel Front-End for MRI, presented at the 40th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA. (2007). 2. Hasan S, et al. FPGA-based architecture for a generalized parallel 2-D MRI filtering algorithm, American J. of Engineering and Applied Sciences. 2012;5(1):25-34. 3. L. Li L, Wyrwicz A.M. Design of an MR image processing module on an FPGA chip, JMR. 2015;(255):51-58. 4. Li L, Wyrwicz A.M. Parallel MR multi-slice image reconstruction implemented on an FPGA, presented at 2015 Minnesota Workshop on High and Ultra-High Field Imaging, Minneapolis, MN, October 2015.

Figures

Functional block diagram of the 2D FFT processor.

Table1: Comparison of the resource usage between two different 2D FFT cores.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)

1911