Manuel A Morales1, Amine Amyar1, Siyeop Yoon1, Jennifer Rodriguez1, Martin S Maron2, Ethan J Rowin2, Shiro Nakamori1, Jiwon Kim3, Robert M Judd4, Jonathan W Weinsaft3, Warren J Manning1, and Reza Nezafat1
1BIDMC, Boston, MA, United States, 2Tufts Medical Center, Boston, MA, United States, 3Weill Cornell Medicine, New York, NY, United States, 4Duke University, Durham, NC, United States
Synopsis
Keywords: Flow, Cardiovascular
Quantification of blood flow using 2D or 4D phase-contrast (PC) MRI is routinely being used to evaluate blood
flow in cardiovascular disease. However, PC has only a modest temporal
resolution compared to echocardiography. We sought to develop a deformation-encoding transformer (DENT)
model for cardiac frame interpolation and evaluate its potential in increasing temporal resolution. DENT
was trained using a large multi-center (centers = 3), multi-vendor (vendors =
3) and multi-field strength (1.5T, 3T) cine MRI dataset (patients = 3178). The
model was successfully applied to 2D/4D-PC MRI without modifications, enabling
a 2-fold gain in temporal resolution.
Background
Quantification of blood flow using 2D or 4D phase-contrast (PC) MRI is routinely being used to evaluate blood
flow in cardiovascular disease. Cardiac 2D or 4D phase contrast (PC) flow imaging allows evaluation of blood
flow and hemodynamics within cardiovascular system. In ECG-segmented 2D or 4D
PC imaging, there is a trade-off
between scan time and spatiotemporal resolution. Insufficient temporal
resolution leads to imprecise peak velocity measurements. We sought to develop
a Deformation ENcoding Transformer (DENT) for high-frame-rate 2D/4D PC MRI.Methods
DENT
worked in the image domain and generated forward-backwards mappings describing
the underlying deformation of multiple cardiac phases. Such deformation, as
well as a blending mask, was used to synthesize images with high framerate.
Input was T = 4 cine images of size W × H collected with temporal resolution ∆t at t
- ∆t, t, t + ∆t and t + 2∆t. Output was
interpolated image at t + 0.5∆t (Fig. 1a).
An
embedding layer extracted F = 32 features per pixel from inputs. Downsampling layers
reduced image dimensionality by half while doubling feature dimensionality.
Inputs to transformer layers were split into windows. Multiscale attention was
used to learn a spatiotemporal correspondence within each window (Fig. 1b). For spatial attention, inputs
were split into N = W•H•T / M2 windows of size M2 ×
F along spatial dimension (M = 8). For temporal attention, inputs were split into N = W•H
windows of size T × F (Fig. 1c). Decoder layers upsampled the dimensionality of encoder outputs by 2
while reducing feature dimensionality by half. These were passed to the motion
synthesizer block. Additional convolutional layers in the block upsampled
the input.
The final outputs of the model were deformation , scaling components , and blending mask (Fig. 1d).
A new cardiac frame was synthesized from each input
image using deformation and scaling components
via bilinear interpolation. The blending mask was used to combine synthesized
images onto a single frame at t + 0.5∆t.
Multi-center (centers = 3), multi-vendor (GE,
Philips, Siemens), multi-field strength (1.5T, 3T) scans from 3178 patients
(2139 male, 54 ± 16 years) undergoing clinical MRI for various cardiac
indications were used for training. Cine images were collected using a
breath-hold ECG-gated segmented SSFP at 1.5T (n = 1831) and 3T (n = 1347) in short axis and 2-, 3- and 4-chamber.
A sample was as a cine slice with T ≥ 7
frames; an epoch one optimization loop across all training samples. First, a
center-frame was
randomly selected and used as the ground-truth. Second, 4 frames adjacent to were
selected as inputs: either or , which had a 2∆t and 4∆t temporal
resolution, accordingly (Fig. 2).
Each cine frame was normalized by min-max
prior to processing. Ground-truth and
input images were randomly cropped to 256 × 256. Training used a batch size =
10 for 100 epochs (~200 hours) using AdaMax. Learning rate was 2 × 10-2
and gradually decayed to 1 × 10-6.
2D or 4D PC MRI imaging dataset from patients
who were imaged using Siemens 3T Vida system were extracted to demonstrate
feasibility of DENT-enabled high-frame-rate flow. Breath-hold ECG-segmented 2D PC
scans were acquired with the following imaging parameters: spatial resolution =
1.9 × 1.9 mm2, temporal resolution = 37 ms; GRAPPA rate = 3, number
of phase-encode line per segment = 4. Navigated free-breathing ECG-segmented 4D
PC scans were acquired (acquisition time = 4.5 min) with the following imaging
parameters: TE/TR = 2.3/4.58 ms, spatial resolution = 2.5 × 2.5 × 2.5 mm3,
acquired temporal resolution = 37 ms, reconstructed number of phases = 25 ms,
vendor-provided WIP compressed sensing rate 7.2, number of phase-encode line
per segment = 2.
Encodings , and were derived from
the anatomical images of flow datasets (Fig.
3). These encodings were applied to both the anatomical and flow images to
synthesize a new cardiac frame. For 4D flow, the encodings were applied to flow
images in x, y and z directions separately. 4D flow were processed and
visualized in Medis Suite (4.0.50.2).Results
trained
DENT model enabled both anatomical and flow images with high-frame-rate, as
demonstrated for 2D PC (Fig. 4). A
two-fold gain was achieved in 4D flow datasets with simulated and reconstructed
temporal resolutions of 54 and 27 ms, accordingly. Our approached resulted in 4D
flow with temporal resolution as low as 14 ms (Fig. 5).
DENT was trained in large cine-based dataset.
Translational feasibility for PC was demonstrated. However, additional (i.e.,
phantom) studies should be performed to assess the contribution of each
encoding component. For instance, in cases where deformation does not play a
major role (e.g., center of vessels), the synthesized flow values are
essentially a linear combination of the four adjacent frames. The coefficients
of that linear combination are given by the blending mask. Nevertheless, our
study showed that transformer-based frame interpolation has the potential to
increase the temporal resolution of PC imagingConclusion
We trained a transformer-based frame interpolator using a large
multi-center, multi-vendor and multi-field strength dataset and demonstrate its
potential in increasing the temporal resolution of 2D/4D PC MRI.Acknowledgements
No acknowledgement found.References
No reference found.