Josip Marjanovic1, Jonas Reber1, David Otto Brunner1, Bertram Jakob Wilm1,2, and Klaas Paul Pruessmann1
1Institute for Biomedical Engineering, University and ETH Zurich, Zurich, Switzerland, 2Skope Magnetic Resonance Technologies, Zurich, Switzerland
Synopsis
Data amounts of massive parallel receiver arrays as well as
latency requirements of real-time applications such as interventional MRI,
navigators etc. prompt for high-speed data preprocessing. First steps in the
reconstruction such as noise pre-whitening or channel combinations and
compressions can be efficiently performed on FPGAs. Here we present a flexible
system and software architecture for such tasks and demonstrate its capability
performing real-time coil compression directly in the spectrometer.Introduction
Real-time and large-scale computing requirements are
increasingly set by novel MRI applications as for instance navigator based
prospective motion correction, interventional imaging, field stabilization,
high bandwidth imaging, massive parallel receive arrays, etc. Especially fast
imaging greatly benefits from large numbers of receiver channels, however
concomitantly scaling the demand for computing and data handling/storage
rendering even application with no latency constraints such as fMRI a
significant computing challenge. Fast processing and reconstruction of MR data
has consequently been reported based on CPU, clusters [1], cloud computers [2]
and GPUs [3]. However, many numerically costly steps in a reconstruction could be
already performed directly on the acquisition system very efficiently and with
low latency means of Field Programmable Gate Arrays (FPGA). Examples for such
operations range from simple numerical type conversions, noise pre-whitening,
channel combination and compression, phase/amplitude extraction to echo-alignment
and phase/frequency corrections. Unfortunately these units and the dataflow on
them are typically on clinical platforms not accessible for generic programming,
nor the required ample resources in computing power is pre-installed.
In this work, we present a generically and flexibly programmable
real-time data-stream architecture on a custom spectrometer platform that
allows fast implementation of FPGA based signal processing into the data
streams of a scalable high channel count receivers. The capability is
exemplified by performing noise pre-whitening, channel combination and compression
in-line for reduction of the data flow and mass storage throughput without
additional CPU load and significant latency.
Methods
The FPGA data architecture is based on unified data
streams that can be entered into available generic computation blocks. Each of
these blocks accepts a data stream of a defined bit-depth and rate and puts out
another stream potentially with a different depth and rate. By concatenation of
these blocks the numerical task can be flexibly configured. Blocks for variable
rate filtering, phase-amplitude extraction and matrix-vector multiplication
have been built a precompiled.
The spectrometer platform consist of a Kintex 7
(Xilinx, San Jose, USA) FGPA connecting to up to 4 in-bore 16-channel receiver
modules. The receivers perform configurable filter and decimation operations of
the receiver data and hand the stream over to the outfield FPGA over high-speed
optical data lines. The computing blocks can then be deployed on this FPGA or
on further FPGAs (NI FlexRIO®, National Instruments, Austin, USA) to where the
data stream can further be piped to via the FlexRIO interface or PXIe
pear-to-pear streaming all with several Gbs rates. For a first demonstration
the noise pre-whitening and channel combination step have been performed in
conjunction with compression of the receive channels and is implemented using a
matrix-vector multiplication block.
The data has been acquired using a clinical, 3T,
8-channel head coil (Achieva, Philips, Best, Netherlands). Noise scans and coil
sensitivity data were previously acquired and the coil combination and compression
matrix coefficients were calculated by the PCA approach of [4] prior to
scanning. Once scanning, the entire channel compression step was handled on the
FPGA.
$$$T_2^*$$$ weighted scans of an oil phantom bottle
and a healthy human volunteer have been acquired with 8 and compressed to 3
channels.
Results
The receiver could acquire scans with 95% duty cycle
and up-to 1MHz bandwidth performing the channel combination on line and
reducing the data flow and reconstruction effort accordingly. No additional CPU
load was present during acquisition and the calculation of the compression data
went in-line with the acquisition of the reference data.
Fig. 4 shows the imaging
results for the phantom bottle and the volunteer for the uncompressed and the
compressed case with approximately 95% of the SNR but only 3/8 of the data
Discussion
The proposed FGPA architecture in soft- and hardware
allows implementing high-throughput computing tasks which can then be performed
highly parallelized with low latency in custom configured pipelines of the FPGA
with comparably low efforts. Thereby frequently encountered bottlenecks in data
flow and processing pipelines such as back planes, LAN/WAN links to reconstruction
units and mass storage drivers can be avoided. Further reconstructions supposed
to render images in real-time running on CPU, clusters, clouds or GPUs can
focus on the complex logic task such as FFT, gridding etc. for which they are
designed for. This allows for using cutting edge sequences and high channel
count arrays for fast imaging and reconstruction with low latency.
Acknowledgements
NanoTera initiative, Wearable MRI project.References
1) Borisch E et al, Real-Time High-Throughput Scalable MRI
Reconstruction via Cluster Computing, Proc ISMRM p1492 2008.
2) Xue H, Distributed MRI reconstruction using Gadgetron-based
cloud computing, MRM 73(3) 2015.
3) Stone S. et al, Accelerating Advanced MRI Reconstructions
on GPUs, J Parallel Distrib Comput, 2011.
Buehrer M et al. Array Compression for MRI With Large Coil
Arrays, MRM 57 2007.