Xinwei Shi^{1,2}, Kathryn Stevens^{2}, and Brian Hargreaves^{1,2}

Multi-Spectral Imaging (MSI) methods, such as SEMAC and MAVRIC-SL, resolve metal-induced field perturbations by applying additional encoding in the spectral dimension, at the cost of increased scan time. In this work, we introduce a 3D-CNN-based reconstruction to accelerate MSI utilizing spatial-spectral features of aliasing artifacts. We demonstrate in in vivo experiments that the proposed method can accelerate MAVRIC-SL acquisitions by a factor of 3 when used alone, and 17-25 when combined with parallel imaging and half-Fourier acquisition. The 3D-CNN showed significant improvement in image quality compared with parallel image and compressed sensing (PI&CS), with negligible additional computation time.

Multi-Spectral Imaging (MSI) techniques, such as SEMAC^{1}
and MAVRIC-SL^{2}, resolve most metal-induced
artifacts by acquiring separate 3D spatial encodings for multiple spectral bins
(similar to slices), at the cost of increased scan time. Various methods have
been explored to accelerate MSI by exploiting correlations between spectral
bins^{3-5}. Model-based reconstruction^{4} and RPCA^{5}
explicitly or implicitly model spectral bin images as the same underlying
magnetization modulated by different RF excitation profiles. They can provide around
20-fold acceleration when combined with parallel imaging (PI) and partial
Fourier sampling (PF). However, they require long reconstruction times due to
computationally expensive iterations.

Deep learning is an emerging
technique in MRI reconstruction with negligible computation time compared to iterative methods^{6-9}. In this work, we introduce a 3D convolutional
neural network (3D-CNN) for accelerating MSI, which learns features of aliasing artifacts
at different scales in spatial-spectral domain.
We
assess image quality in reader studies, comparing 3D-CNN output from retrospectively
under-sampled data to the fully-sampled reference.

**Reconstruction framework**

** **The framework of the proposed 3D-CNN-based reconstruction is shown in
Fig.1a. The under-sampled k-space data of each bin is first processed
separately with PI & CS^{10}. Next a 3D-CNN is used to remove
residual aliasing artifacts and blurring in the initial bin images. The initial
bin images are first sliced along the readout (x) direction, so that the input of the
network has dimension [y, z, bin]. The output of the network is the residual image,
which has the same dimension. The reconstructed bin images (initial + residual)
are combined to form the final composite image.

**Network architecture**

A 3D U-Net^{11} architecture (Fig. 1b) with a total of 16
convolution layers is used for this study. The 2D U-Net^{12} has been
popular in MRI reconstruction for its ability to learn features at different scales
without losing high-spatial-frequency information^{7,8}. Here the
spectral dimension is treated
equivalently
to the spatial dimensions in convolutions,
downscaling and upscaling operations. The spectral dimension can be preserved
throughout the network by using 3D convolutions, but not 2D convolutions, as
demonstrated in Fig.2.

**Experiments**

The 3D-CNN was trained and tested
with MAVRIC-SL
proton-density-weighted scans of 15 volunteers (8 for training, 7 for test) with total-hip-replacement
implants. All scans were performed on GE 3T MRI systems with 24 spectral bins,
2x2 uniform under-sampling and half-Fourier acquisition. Other parameters
include: 32-channel torso array, matrix size=384x256x(24-44), voxel
size=1.0x1.6x4.0mm^{3}. The images reconstructed by bin-by-bin
PI&CS using all acquired data were used as the reference. Outer k-space was
further under-sampled by 5 retrospectively with complementary Poisson-disc
sampling^{13} (total R=17-25), and reconstructed by PI&CS and
3D-CNN. A total of 3072 training samples were used since y-z slices were reconstructed
separately. Images were evaluated with normalized root-mean-square error (nRMSE),
structural similarity index (SSIM), and scored by an experienced MSK radiologist
using a 5-point scale (from 0 to 5: non-diagnostic; limited; diagnostic; good;
excellent) in three categories: image sharpness, artifacts near metal and overall
image quality.

The 3D-CNN significantly improved the image quality of bin-by-bin PI&CS with negligible computation time (<10s with 1 GPU). The network learns aliasing artifacts effectively, because the encoder/decoder blocks can analyze and synthesize features at multiple spatial-spectral scales. We also experimented with a 2D U-Net architecture, which collapses the spectral dimension at the beginning, and the results were blurrier than 3D U-Net. The 3D U-Net architecture may be useful in other image reconstruction problems to utilize correlations in non-spatial dimensions, such as the temporal dimension.

The bin-by-bin initialization step integrates PI into the proposed reconstruction and reduced the dimension of the network by combining the coil images. Since this step is currently the bottleneck in terms of both computation and image quality, we will explore end-to-end deep learning in future work.

Figure 1. **(a)** Framework of the proposed 3D-CNN-based MSI reconstruction. **(b)** 3D U-Net architecture with two encoder blocks, two decoder blocks, and skip connections
between mirrored layers in encoder and decoder blocks. The feature maps are downscaled/upscaled by 2x in all three dimensions
by each encoder/decoder block. Each encoder block consists of three conv3d-batchNorm-leakyRelu sequences, and 3D max-pooling (maxPool3d) is used for downscaling. Each decoder block consists of one 3D transpose convolution (convT3d) for upscaling, and two conv3d-batchNorm-leakyRelu sequences. The numbers of
filters in each convolution layer are respectively 32, 64, 64 for the three
scales.

Figure 2. Demonstration of the difference between 2D and 3D convolutions. The spectral dimension is collapsed in the output of 2D convolution (a),
but is preserved in the output of 3D convolution (b).

Figure 3. Example of
input and output images of 3D-CNN, and error maps compared with the reference images. The arrows point to
locations of hip implants. The first row shows the bin-combined image. 2nd-3rd rows are images of two neighboring
bins A and B. The last two rows show the error maps of the bin images (scaled by 5 for better
visualization).

Figure 4. Results in two volunteers with total hip replacements. 3D-CNN-reconstruction
results show sharper details (solid arrows) and suppressed aliasing artifacts (arrowheads)
compared with bin-by-bin PI&CS, and appear much more similar to the
reference images even near the implant(s).

Figure 5. Radiologist scores comparing bin-by-bin PI&CS, 3D-CNN results
(R=17-25) and reference images from standard MAVRIC-SL (R=7-8) for 7 cases. The
scoring used a 5-point scale ranging from 1(non-diagnostic) - 5(excellent).
Two-sided Wilcoxon signed rank tests show a significant improvement in sharpness and
overall image quality in 3D-CNN results over bin-by-bin PI&CS (**
P<0.01). There is no significant difference between the three results in artifacts
near metal. There is no significant difference between 3D-CNN results and
reference in image sharpness (P=0.070) or overall image quality (P=0.122).