An FPGA Based Real-Time Data Processing Structure – Application to Real-Time Array Coil Data Compression

Josip Marjanovic¹, Jonas Reber¹, David Otto Brunner¹, Bertram Jakob Wilm^1,2, and Klaas Paul Pruessmann¹

¹Institute for Biomedical Engineering, University and ETH Zurich, Zurich, Switzerland, ²Skope Magnetic Resonance Technologies, Zurich, Switzerland

Synopsis

Data amounts of massive parallel receiver arrays as well as latency requirements of real-time applications such as interventional MRI, navigators etc. prompt for high-speed data preprocessing. First steps in the reconstruction such as noise pre-whitening or channel combinations and compressions can be efficiently performed on FPGAs. Here we present a flexible system and software architecture for such tasks and demonstrate its capability performing real-time coil compression directly in the spectrometer.

Introduction

Real-time and large-scale computing requirements are increasingly set by novel MRI applications as for instance navigator based prospective motion correction, interventional imaging, field stabilization, high bandwidth imaging, massive parallel receive arrays, etc. Especially fast imaging greatly benefits from large numbers of receiver channels, however concomitantly scaling the demand for computing and data handling/storage rendering even application with no latency constraints such as fMRI a significant computing challenge. Fast processing and reconstruction of MR data has consequently been reported based on CPU, clusters [1], cloud computers [2] and GPUs [3]. However, many numerically costly steps in a reconstruction could be already performed directly on the acquisition system very efficiently and with low latency means of Field Programmable Gate Arrays (FPGA). Examples for such operations range from simple numerical type conversions, noise pre-whitening, channel combination and compression, phase/amplitude extraction to echo-alignment and phase/frequency corrections. Unfortunately these units and the dataflow on them are typically on clinical platforms not accessible for generic programming, nor the required ample resources in computing power is pre-installed. In this work, we present a generically and flexibly programmable real-time data-stream architecture on a custom spectrometer platform that allows fast implementation of FPGA based signal processing into the data streams of a scalable high channel count receivers. The capability is exemplified by performing noise pre-whitening, channel combination and compression in-line for reduction of the data flow and mass storage throughput without additional CPU load and significant latency.

Methods

The FPGA data architecture is based on unified data streams that can be entered into available generic computation blocks. Each of these blocks accepts a data stream of a defined bit-depth and rate and puts out another stream potentially with a different depth and rate. By concatenation of these blocks the numerical task can be flexibly configured. Blocks for variable rate filtering, phase-amplitude extraction and matrix-vector multiplication have been built a precompiled. The spectrometer platform consist of a Kintex 7 (Xilinx, San Jose, USA) FGPA connecting to up to 4 in-bore 16-channel receiver modules. The receivers perform configurable filter and decimation operations of the receiver data and hand the stream over to the outfield FPGA over high-speed optical data lines. The computing blocks can then be deployed on this FPGA or on further FPGAs (NI FlexRIO®, National Instruments, Austin, USA) to where the data stream can further be piped to via the FlexRIO interface or PXIe pear-to-pear streaming all with several Gbs rates. For a first demonstration the noise pre-whitening and channel combination step have been performed in conjunction with compression of the receive channels and is implemented using a matrix-vector multiplication block. The data has been acquired using a clinical, 3T, 8-channel head coil (Achieva, Philips, Best, Netherlands). Noise scans and coil sensitivity data were previously acquired and the coil combination and compression matrix coefficients were calculated by the PCA approach of [4] prior to scanning. Once scanning, the entire channel compression step was handled on the FPGA. $$$T_2^*$$$ weighted scans of an oil phantom bottle and a healthy human volunteer have been acquired with 8 and compressed to 3 channels.

Results

The receiver could acquire scans with 95% duty cycle and up-to 1MHz bandwidth performing the channel combination on line and reducing the data flow and reconstruction effort accordingly. No additional CPU load was present during acquisition and the calculation of the compression data went in-line with the acquisition of the reference data. Fig. 4 shows the imaging results for the phantom bottle and the volunteer for the uncompressed and the compressed case with approximately 95% of the SNR but only 3/8 of the data

Discussion

The proposed FGPA architecture in soft- and hardware allows implementing high-throughput computing tasks which can then be performed highly parallelized with low latency in custom configured pipelines of the FPGA with comparably low efforts. Thereby frequently encountered bottlenecks in data flow and processing pipelines such as back planes, LAN/WAN links to reconstruction units and mass storage drivers can be avoided. Further reconstructions supposed to render images in real-time running on CPU, clusters, clouds or GPUs can focus on the complex logic task such as FFT, gridding etc. for which they are designed for. This allows for using cutting edge sequences and high channel count arrays for fast imaging and reconstruction with low latency.

Acknowledgements

NanoTera initiative, Wearable MRI project.

References

1) Borisch E et al, Real-Time High-Throughput Scalable MRI Reconstruction via Cluster Computing, Proc ISMRM p1492 2008. 2) Xue H, Distributed MRI reconstruction using Gadgetron-based cloud computing, MRM 73(3) 2015. 3) Stone S. et al, Accelerating Advanced MRI Reconstructions on GPUs, J Parallel Distrib Comput, 2011. Buehrer M et al. Array Compression for MRI With Large Coil Arrays, MRM 57 2007.

Figures

Figure 1: Top; Functional overview of the entire acquisition system. Bottom; Image and overlayed function block overview of the data acquisition and real-time processor unit. The data is acquired by custom, in-field receiver unit and delivered to the processor by multi gigabit fiber optic data links. The cores on the FPGA can the process the data of up to 4 of such 16 channel receiver modules and hand the data over to a PXIe chassis.

Figure 2: Processing block diagram. Data streams can be piped between different precompiled processing blocks.

Figure 3: Detailed view of the applied matrix-vector multiplier unit acting on the incoming data stream to combine or compress the data originating from different coils.

Figure 4: Results from compressing the output of an 8 channel receive coil acquiring a T2* weighted FFE at 3T in real-time to 3 combined virtual channels. 95% of the SNR is preferred.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)

1786