1873

Benchmarking Accelerated MRI: A Head-to-Head Comparison of Deep Learning Reconstruction and Super-Resolution Techniques

Eric K. Gibbons¹, Zhongnan Fang², Arjun D. Desai³, Christopher M. Sandino³, Garry E. Gold⁴, Brian A. Hargreaves⁴, and Akshay S. Chaudhari⁴
¹Department of Electrical and Computer Engineering, Weber State University, Ogden, UT, United States, ²Lvis Corporation, Palo Alto, CA, United States, ³Department of Electrical Engineering, Stanford University, Stanford, CA, United States, ⁴Department of Radiology, Stanford University, Stanford, CA, United States

Synopsis

Deep-learning (DL) can be used to extend compressed sensing (CS) to learn the regularization function in a data-driven manner. In contrast, super resolution (SR) algorithms have been used to transform rapidly-acquired low-resolution images into higher-resolution images. This work compares DL-CS with DL-SR for accelerated MRI on a test dataset of 50 patients with conventional image quality metrics and clinically-relevant quantitative T₂ relaxation measurements. We demonstrate that DLCS approaches outperform DLSR approaches for accelerated MRI.

Introduction

Deep-learning (DL) has been used in research studies for accelerated MR imaging, with two varying approaches. One approach extends compressed sensing (CS) with DL to learn the regularization function in a data-driven manner. The other uses low-quality to high-quality image-to-image translations, with an example of transforming rapidly-acquired low-resolution images into higher-resolution using super-resolution (SR) algorithms. However, no systematic comparison has been performed to compare DL-CS with DL-SR for accelerated MRI. Here, we benchmark these two methods on a test dataset of 50 patients with conventional image quality metrics and clinically-relevant quantitative T₂ relaxation measurements.

Methods

Imaging data were acquired with informed consent and IRB approval from 225 subjects (119 male, age:44±18, weight: 78±19kg) undergoing a diagnostic knee MRI scan. A 3D quantitative double-echo in steady-state (qDESS) sequence was added to their 3T imaging protocol (parameters: matrix=416×512, field-of-view=16×16cm, slice thickness=1.6cm, TE/TR=5.7/17.9ms, bandwidth=±32.5kHz, scan time=5mins). The qDESS scan was also used to calculate the quantitative T₂ relaxation time maps according to prior validated¹. qDESS images were acquired with 2×1 parallel imaging, which maintained adequate diagnostic quality. Fully-sampled k-space data was synthesized using autocalibrating reconstruction with Cartesian imaging, which served as the ground truth.

CS sparsely samples data across the entire k-space producing images with incoherent aliasing artifacts while SR densely samples the center of k-space producing low-resolution images (Fig. 1). We compared 3x, 6x, and 8x accelerations. For both DLCS and DLSR reconstructions, undersampling was performed in the phase (k_y) and slice dimensions (k_z). For CS sampling, a 16×16 autocalibration region was fully sampled along with random 2D uniform k-space sampling². For the SR sampling, the phase and slice dimensions were down-sampled to a low-frequency center³. Both methods generated input images using zero-filling (ZF) with inverse Fourier transforms and coil sensitivity maps. Since undersampling was performed in (k_y-k_z), 416 slices were available per patient.

Data was split by patient between training (132 scans, 50,000+ slices), validation (45 scans, 18,000+ slices), and testing (50 scans, 20,000+ slices) sets. DLCS reconstruction followed prior validated procedures and used 3D ResNet proximal blocks with 3 slices each². Two state-of-the-art DLSR algorithms (Very Deep SR [VDSR]³, and Enhanced Deep SR [EDSR]⁴ were used to perform patch-wise training with 32×32×32 blocks.

Quantitative benchmarks were computed with traditional image quality metrics of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the inferred images against the ground-truth images. Image blurring averaged in the y-z dimension was quantified using the blur metric as described previously⁵ and statistical comparisons were conducted with Kruskal-Wallis tests and Dunn posthoc tests.

Manual segmentations for femoral, tibial, and patellar cartilage were performed on all test-set images to compute tissue-specific cartilage T₂ values in the inferred and ground-truth images. Bland-Altman plots, concordance correlation coefficients, and student’s t-test were computed to assess systematic differences in T₂ values compared to the reference.

Results

Axial and sagittal multi-planar reformations for all reconstruction methods in a patient with an arthroscopically-confirmed full-thickness chondral lesion and a meniscus tear show that these pathologies can be seen clearly in the DLCS, VDSR, and EDSR reconstructions (Fig. 2). The DLCS images were the most visually appealing across the three reconstructions and two ZF initializations for all accelerations and image orientations.

The conventional image quality metrics of PSNR and SSIM for the two qDESS echoes and different reconstruction methods (Fig. 3) showed that reconstruction performance degraded as the accelerations increased. The DLCS, VDSR, and EDSR methods were comparable for the first qDESS echo that has a higher baseline SNR but DLCS had slightly degraded SSIM compared to VDSR and EDSR. In contrast, the blur metric demonstrated that DLCS had significantly lower blurring (sharpest images) compared to both the SR methods at all accelerations (p<0.001). The DLCS blurring rivaled that of the ground-truth images.

While VDSR and EDSR produced comparable PSNR and SSIM values as DLCS, they produced error offsets in T2 values (Fig. 4). DLCS had the lowest Bland-Altman bias and limits-of-agreement, and the highest concordance correlations coefficients (0.7+), amongst all accelerations and reconstruction methods. Unlike DLCS, both VDSR and EDSR had significantly biased T₂ values for all accelerations (p<0.05).

Discussion

The data presented here demonstrate the DLCS reconstruction outperforms state-of-the-art VDSR and EDSR reconstruction techniques based on visual inspection, evaluation of the blurring metrics, and lack of biases in cartilage T₂ values. We surmise that enforcing data consistency in DLCS maintains T₂ accuracy while the SR zero-filled images already have large T₂ biases.

When comparing the reconstructions using conventional metrics such as PSNR and SSIM, DLCS had comparable or slightly worse metrics. However, these metrics were created for natural images and have been shown to inadequately depict true MRI quality and diagnostic utility⁶. Thus using such image-level quality metrics along with clinically-relevant metrics such as blurring and T₂ relaxation that assesses pixel-level parametric accuracy may provide improved methods for evaluating true reconstruction quality.

Conclusion

Using a combination of conventional image quality metrics, blurring metrics, and clinically-relevant quantitative parameter maps, we demonstrate that DLCS approaches outperform DLSR approaches for accelerated MRI.

Acknowledgements

Funding from: R01 AR077604, R01 EB002524, and K24 AR062068 from the NIH; the Precision Health and Integrated Diagnostics (PHIND) Seed Grant from Stanford University; Philips, GE Healthcare.

References

[1] Sveinsson, B., et al. "A simple analytic method for estimating T2 in the knee from DESS." Magnetic Resonance Imaging 38 (2017): 63-70.

[2] Sandino, Christopher M., et al. "Compressed sensing: From research to clinical practice with deep neural networks: Shortening scan times for magnetic resonance imaging." IEEE Signal Processing Magazine 37.1 (2020): 117-127.

[3] Chaudhari, Akshay S., et al. "Super-resolution musculoskeletal MRI using deep learning." Magnetic Resonance in Medicine 80.5 (2018): 2139-2154.

[4] Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017.

[5] Chaudhari, Akshay S., et al. "Utility of deep learning super‐resolution in the context of osteoarthritis MRI biomarkers." Journal of Magnetic Resonance Imaging 51.3 (2020): 768-779.

[6] Knoll, Florian, et al. "Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge." Magnetic Resonance in Medicine 84.6 (2020): 3054-3070.

Figures

Fig. 1: Proposed network architectures. a) CS and SR sampling masks. b) A single block of the ResBlock 3D element that is used in the DLCS and EDSR networks. d) A single iteration of the DLCS reconstruction pipeline. d) An representation of the VDSR reconstruction pipeline. e) A representation of the EDSR reconstruction scheme.

Fig. 2: qDESS echo 1 reconstructions for a 50-year-old male with cartilage and meniscus lesions. a) A fully-sampled axial slice reconstruction (yellow arrow indicates a grade 3 full-thickness chondral defect in the patella). b) Undersampled reconstructions (columns) with varying acceleration factors (rows) on the same axial slice. c) A fully-sampled sagittal slice reconstruction (green arrow is a complex tear of the posterior horn of the medial meniscus). d) Undersampled reconstructions (columns) with varying acceleration factors (rows) on the same sagittal slice.

Fig. 3: qDESS echo 2 reconstructions for a 50-year-old male with cartilage and meniscus lesions. a) A fully-sampled axial slice reconstruction (yellow arrow indicates a grade 3 full-thickness chondral defect in the patella). b) Undersampled reconstructions (columns) with varying acceleration factors (rows) on the same axial slice. c) A fully-sampled sagittal slice reconstruction (green arrow is a complex tear of the posterior horn of the medial meniscus). d) Undersampled reconstructions (columns) with varying acceleration factors (rows) on the same sagittal slice.

Fig. 4: Violin plots showing the reconstruction performance measured against standard image quality metrics. The blue violin plots are from the first echo images. The red violin plots are from the second echo images. Each row represents a different metric as indicated on the y-axis label. Each column represents a different acceleration. Note: for both peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), higher values are better whereas for the blur metric lower values are better.

Fig. 5: Bland Altman plots for T₂ data measured from the reference images (noted here as T₂) and ground-truth T₂ (noted here as T'₂). In each of the plots, the blue, green, and red markers indicate x3, x6, and x8 acceleration data. The table on the bottom right shows the bias and the limits of agreement values (in parentheses) along with concordance correlation coefficients for all reconstructions and acceleration. DLCS had the highest T₂ accuracy, while the SR methods (including the CS-FZ initialization) had lower accuracy across all accelerations.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)

1873

DOI: https://doi.org/10.58530/2022/1873