1870

SSFD: Self-Supervised Feature Distance Outperforms Conventional MR Image Reconstruction Quality Metrics

Philip M. Adamson¹, Jeffrey Dominic¹, Arjun Desai¹, Christian Bluethgen², Jeff P. Wood³, Ali B. Syed², Robert Boutin², Kathryn J. Stevens², Daniel Spielman², Shreyas Vasanawala², John M. Pauly¹, Akshay S. Chaudhari², and Beliz Gunel¹
¹Department of Electrical Engineering, Stanford University, Palo Alto, CA, United States, ²Department of Radiology, Stanford University, Palo Alto, CA, United States, ³Austin Radiological Association, Austin, TX, United States

Synopsis

Evaluation of accelerated magnetic resonance imaging (MRI) reconstruction methods is imperfect due to the discordance between quantitative image quality metrics (IQMs) and radiologist-perceived image quality. Self-supervised learning (SSL) is a deep learning (DL) method that has become a popular pre-training tool due to its ability to capture generalizable and domain-specific feature representations of the underlying data without the need for labels. In this study, we derive a data-driven self-supervised feature distance (SSFD) IQM to assess MR image reconstruction quality. We demonstrate that SSFD is more highly correlated to three radiologist’s perceived image quality on DL-based sparse reconstructions than conventional IQMs.

Introduction

Metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are routinely used as image quality metrics (IQMs) for MR reconstructions. However, these do not always correlate well with clinical utility for downstream tasks and radiologist review, the gold-standard for image quality assessment^1-3. Thus, it is imperative to develop quantitative IQMs with higher concordance to clinical relevant radiologist-perceived image quality.

Self-supervised learning (SSL) is a deep learning (DL) method that uses pre-text tasks to generate labels from unlabeled data. SSL has been shown to help models learn semantically meaningful image-level feature representations from unlabeled datasets, motivating its utility in calculating a quantitative IQM^4,5. In this study, we propose to learn image-level feature representations of MRI data via the pre-text task of context prediction⁴. Here, a self-supervised model learns an in-painting task on MR images, following a pre-text task of randomly masking out small patches from the images. Using image quality ratings from three radiologists, we demonstrate that our proposed metric is a superior IQM to SSIM and PSNR.

Methods

We used the fastMRI proton-density-weighted knee MRI scans with and without fat saturation in our study⁶. Reference fully-sampled images were computed from the fully-sampled k-space data using ESPIRiT⁷. For the SSL task we split the 2D fast spin echo scans into training, validation, and testing splits with 778, 195, and 199 3D scans respectively. For DL-based undersampled MR reconstruction, we used a subset of 108 patient for training, 30 patients for validation, and 61 patients withheld for testing and the reader study. The supervised DL-reconstruction models were trained to reconstruct 2x, 4x and 6x accelerated scans using both a U-Net model⁸ and an unrolled network⁹, each with complex inputs, 2 convolutions per layer and 4 levels with 32-256 quadratically increasing filters.

Image corruptions for the context prediction task were generated dynamically during training by placing masked zero-valued image patches of size 16x16 pixels over 25% of the image area via Poisson variable density (Fig. 1). A self-supervised UNet model (with 2 convolutions per level and 5 levels with 20-320 quadratically increasing filters) was trained to in-paint the masked patches. This model was then truncated after the ReLu activation in a given layer. The center slice from ground truth and reconstructed image pairs are separately passed through the truncated model, producing two feature space outputs. SSFD is the element-wise mean square error between these two feature space representations (Fig. 1).

To understand the robustness of IQMs compared to SSFD under known image perturbations, we explored the impact of pixel shifts, Gaussian blurring, and additive Gaussian noise on SSFD (9^th layer) and SSIM (Fig. 2). Each perturbation was applied to the center slice for each of 199 scans in the test set until a mean SSIM of 0.3 was reached. Pixel shifts were applied by rolling pixels in the x-direction with tricubic interpolation from 0 to 0.53 mm (1.2 pixels). The standard deviation of the Gaussian kernel for blurring varied between 0 and 1.97 mm (4.5 pixels) and for additive Gaussian noise varied from 0 to 0.8.

Three radiologists rated the diagnostic quality of the center slice of 366 accelerated MR reconstructions from 61 patients, each reconstructed with the 6 models described above in a blinded manner. The radiologists scored the image reconstructions for aliasing artifacts and diagnostic quality of the cartilage and meniscus on the following 1-9 scale: 1- completely non-diagnostic, 3- severe corruptions, 5- diagnostically acceptable, 7- good quality, 9- perfect quality. The mean radiologist image quality score (RIQS) was compared against SSFD (7^th layer), SSIM and PSNR, from which the squared Pearson correlation coefficient (R²) and Spearman rank order correlation coefficient (SROCC) were computed. SROCC is a measure of the monotonicity between variables, as the relationship between IQMs and subjective scores may be non-linear⁹. We further assessed the ability to use the metrics as a binary classifier on whether a scan is diagnostically acceptable (RIQS >= 5), assessed in terms of AUROC.

Results & Discussion

The average SSFD and SSIM metrics under image perturbations (Fig. 2) showed that SSFD increased approximately linearly under Gaussian noise and blurring perturbations, while SSIM decreased approximately as a decaying exponential. SSFD was comparatively insensitive to pixel shifts, indicating that SSFD captures more global image quality features that are less sensitive to exact pixel-level correspondence than SSIM and PSNR.

SSFD achieved the highest correlation with mean RIQS from the three readers for both aliasing and cartilage/meniscus assessment in terms of both R² and SROCC across the 366 images (Fig. 3). An example patient with all 6 reconstruction types and their corresponding IQMs are shown in Figure 4. SSFD also outperformed SSIM and PSNR in terms of AUROC when used as a binary classifier (Fig. 5).

Conclusion

This work introduces the SSFD IQM based on MR domain-specific feature representations learned from a SSL task. We show that SSFD is more highly correlated to radiologist perceived diagnostic utility of sparse MR reconstructions than conventional metrics such as SSIM and PSNR.

Acknowledgements

NIH R01 AR077604, R01 EB002524 and K24 AR062068

Radiological Sciences Laboratory Seed Grant from Stanford University

References

1. Allister Mason et al. “Comparison of objective image quality metrics to expert radiologists’ scoring of diagnostic quality of MR images”. In: IEEE transactions on medical imaging 39.4 (2019), pp. 1064–1072.

2. Akshay S Chaudhari et al. “Prospective deployment of deep learning in MRI: A framework for important considerations, challenges, and recommendations for best practices”. In: Journal of Magnetic Resonance Imaging (2020).

3. Florian Knoll et al. “Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge”. In: Magnetic resonance in medicine 84.6 (2020), pp. 3054–3070.

4. Deepak Pathak et al. “Context encoders: Feature learning by inpainting”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 2536–2544.

5. Krishna Chaitanya et al. “Contrastive learning of global and local features for medical image segmentation with limited annotations”. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. (2020).

6. Florian Knoll et al. “fastMRI: A Publicly Available Raw k-Space and DICOM Dataset of Knee Images for Accelerated MR Image Reconstruction Using Machine Learning.” eng. In: Radiol Artif Intell 2.1 (Jan. 2020), e190007. issn: 2638-6100 (Electronic); 2638-6100 (Linking). doi: 10.1148/ryai. 2020190007.

7. Martin Uecker et al. “ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA”. In: Magnetic resonance in medicine 71.3 (2014), pp. 990–1001.

8. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

9. Christopher M Sandino et al. “Compressed sensing: From research to clinical practice with deep neural networks: Shortening scan times for magnetic resonance imaging”. In: IEEE signal processing magazine 37.1 (2020), pp. 117– 127.

10. Hamid R Sheikh, Muhammad F Sabir, and Alan C Bovik. “A statistical evaluation of recent full reference image quality assessment algorithms”. In: IEEE Transactions on image processing 15.11 (2006), pp. 3440–3451.

Figures

SSFD Computation: A UNet model is pre-trained on an in-painting task to fill in image details on masked patches of fully-sampled MR reconstructions. The model is truncated after the ReLu activation in a given layer (7^th unless otherwise stated) and used to extract a feature space representation for both the ground truth MR reconstruction and a DL-based sparsely sampled reconstruction. SSFD is the mean square error between these two feature representations.

Example images at the 50th percentile of perturbations, where the roll image is the difference between the perturbed and original scan (top). Average SSFD (9^th layer, bottleneck) and SSIM with 95% confidence intervals from the 199 MR scan test set as a function of image perturbation (bottom). SSFD is less sensitive to pixel shifts compared to Gaussian blurring and noise for comparable decreases in SSIM, relaxing constraints that require pixel-level correspondence between scans.

SSFD (7^th layer), SSIM and PSNR versus mean RIQS from 3 radiologists for aliasing and cartilage/mensiscus assessment across the center slice from 61 images, each with 2x (blue), 4x (orange) and 6x (green) accelerations with a UNet (circle) and Unrolled (X’s) network reconstructions. In general, RIQS improve for lower accelerations (green to orange to blue) and for the Unrolled (X) versus UNet (circle) generated images. SSFD achieves the highest correlation with RIQS for both aliasing and cartilage/meniscus assessment in terms of both R² and SROCC.

Example of the six reconstruction techniques (left), and plots of their image quality metrics versus mean radiologist image quality score (right). Note the monotonic relationship for SSFD versus both RIQS categories compared to the non-monotonic relationships of SSIM and PSNR in this example. In particular, the 6x unrolled reconstruction (green X) has poor image quality based on RIQS, but has comparable SSIM and PSNR values to the 2x UNet reconstruction (blue circle). The difference in image quality is better captured by SSFD.

True positive versus false positive rate of SSFD (7^th layer), SSIM, and PSNR as a simple thresholding binary classifier for predicting if the mean RIQS is less than 5, and therefore if a given reconstruction would be rejected by the radiologist. SSFD achieves a higher AUROC for both aliasing and meniscus/cartilage assessment than SSIM and PSNR.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)

1870

DOI: https://doi.org/10.58530/2022/1870