Limin Li1 and Alice M Wyrwicz1,2
1Center for Basic MR Research, Northshore University Healthsystem, Evanston, IL, United States, 2Department of Biomedical Engineering, Northwestern University, Evanston, IL, United States
Synopsis
The processing rate for real-time multi-slice
image reconstruction on an FPGA can be improved significantly by taking
advantage of its parallel processing capability. In particular, multiple 2D FFT
processors can be embedded into a single FPGA and run simultaneously. In this
abstract, we report a new design of a 2D FFT processor with significant reduced
usage of hardware resource. Test results show that an important type of
resource, DSP48 slice, can be reduced by up to 50% without degrading processing
performance, which implies that more 2D FFT cores can be installed into a
single FPGA with a given size.Introduction
With increasing capacities and decreasing cost, Field-Programmable
Gate Arrays (FPGAs) are gradually utilized in MR image processing
1-4. By taking advantage of the FPGA's parallel
processing capability, dramatic increases in processing rate can be achieved. In
particular, in the case of multi-slice image reconstruction, multiple 2D FFT
processors can be embedded into an FPGA and run simultaneously
4. The
higher processing rate is achieved at the expense of hardware resource on the
FPGA. Therefore it is important that the architecture of FFT processors is designed
such that the resources are utilized efficiently. In this abstract, we describe
the design, implementation and testing of a 2D FFT module in which the
utilization efficiency of hardware resources is improved significantly while
maintaining the same processing performance.
Design and Implementation
The design was motivated by the fact that MR
time-domain baseband signal samples are delivered to FPGA-based FFT processors
at rates not exceeding 400 kHz, equivalent to one sample per 2.5 microseconds. An
FPGA device is capable of reading in input samples at rates of one sample per 25
ns or faster. When such an FPGA processor is used to process MR data in real
time, portions of the processor may be in an idle state in many clock cycles.
One strategy for reducing hardware usage is to reuse portions of the processor
within a single process. Therefore in our design, as compared to the previous
one
3, only one 1D FFT core is employed for both first and second FFT
computations. Fig. 1 shows the functional block diagram for the architecture of
the 2D FFT core module. The 1D FFT core was built on National Instruments’ 1D
FFT subVI. It consists of two inputs and two outputs. When executing the first
1D FFT computations, a 2D slice of data with NxN matrix streams into an FPGA via a FIFO buffer. The data
flows through the 1D FFT core along a path labeled "1" and are routed
to an on-chip memory SRAM for
temporary storage. The storing order of the data elements is controlled with
proper addresses generated by an Address Generation Unit (AGU1). The first FFT computations are completed
after the whole 2D slice are stored in the SRAM.
The second FFT starts with reading in the input data from the SRAM. During the execution of the second FFT, data
flows along a path labeled "2". The image reconstruction of the 2D slice
is completed after a second pass of the 1D FFT core. The timing for every
processing step must be accurately controlled such that data race does not
occur between the intermediate and new data. The design was developed in
LabView 2014 and the 2D FFT core was built on FPGA chips XC6SLX45 and XC7K410T
(Xilinx Inc., San Jose, CA USA.) which sit respectively on NI sbRIO-963 and USRP-2940R
boards (National Instruments, Inc., Austin, TX USA).
Results and Discussion
We tested and evaluated the 2D FFT core module by
reconstructing multi-slice images. A graphical user interface (GUI) program was
developed using the LabView platform to initialize and start the FPGA board and
manage data transfer between the FPGA boards and a host PC, as previously
described
3,4. In this work, the tests were performed on static data
(previously acquired and stored in the PC), typically structured with a matrix
of 128x128x16. We validated that the new 2D FFT core was capable of
executing multi-slice image reconstruction at rates up to ~1000 frames per
second. These processing rates were about the same as those for the 2D FFT
processors with the two-1D-FFT-core architecture
3. We also evaluated
the usage of the hardware resources and compared the results with those of the previous
design. Four types of predefined resources on the FPGAs show variable degrees
of usage reduction (Table 1). Among them, the number of required DSP48 slices is
reduced by ~30% on XC7K410T and ~50% on XC6SLX45 respectively. DSP48 denotes a
logic slice specially designed to boost the performance of digital signal processing.
The DSP48 slices are limited on a single FPGA. Therefore the reduced usage of
DSP48 slices enables to employ more 2D FFT cores on the FPGA and achieve much
higher processing rates with processing modules built on parallel processing architecture
4.
Conclusion
This work demonstrates that by reusing portions
of logic on the FPGA, utilization of hardware resource is more efficient. This
design strategy can be used to dramatically improve the processing capabilities
of a single FPGA.
Acknowledgements
This work was supported by NIH grants R01
NS44617 and 1S10RR15685.References
1. Dalal IL, F.L. Fontaine, A Reconfigurable FPGA-based 16-Channel Front-End for MRI, presented at the 40th Asilomar Conference on
Signals, Systems and Computers, Pacific Grove, CA, USA. (2007). 2. Hasan S, et
al. FPGA-based architecture for a generalized parallel 2-D MRI filtering algorithm, American J. of Engineering
and Applied Sciences. 2012;5(1):25-34.
3. L. Li L, Wyrwicz A.M. Design of an MR image processing module on an FPGA
chip, JMR. 2015;(255):51-58. 4. Li L, Wyrwicz A.M. Parallel
MR multi-slice image reconstruction implemented on an FPGA, presented at 2015
Minnesota Workshop on High and Ultra-High Field Imaging, Minneapolis, MN,
October 2015.