Philip M. Adamson1, Jeffrey Dominic1, Arjun Desai1, Christian Bluethgen2, Jeff P. Wood3, Ali B. Syed2, Robert Boutin2, Kathryn J. Stevens2, Daniel Spielman2, Shreyas Vasanawala2, John M. Pauly1, Akshay S. Chaudhari2, and Beliz Gunel1
1Department of Electrical Engineering, Stanford University, Palo Alto, CA, United States, 2Department of Radiology, Stanford University, Palo Alto, CA, United States, 3Austin Radiological Association, Austin, TX, United States
Synopsis
Evaluation of accelerated magnetic resonance imaging (MRI) reconstruction methods is imperfect due to the discordance between quantitative image quality metrics (IQMs) and radiologist-perceived image quality. Self-supervised learning (SSL) is a deep learning (DL) method that has become a popular pre-training tool due to its ability to capture generalizable and domain-specific feature representations of the underlying data without the need for labels. In this study, we derive a data-driven self-supervised feature distance (SSFD) IQM to assess MR image reconstruction quality. We demonstrate that SSFD is more highly correlated to three radiologist’s perceived image quality on DL-based sparse reconstructions than conventional IQMs.
Introduction
Metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) are routinely used as image quality metrics (IQMs) for MR reconstructions. However, these do not always correlate well with clinical utility for downstream tasks and radiologist review, the gold-standard for image quality assessment1-3. Thus, it is imperative to develop quantitative IQMs with higher concordance to clinical relevant radiologist-perceived image quality.
Self-supervised learning (SSL) is a deep learning (DL) method that uses pre-text tasks to generate labels from unlabeled data. SSL has been shown to help models learn semantically meaningful image-level feature representations from unlabeled datasets, motivating its utility in calculating a quantitative IQM4,5. In this study, we propose to learn image-level feature representations of MRI data via the pre-text task of context prediction4. Here, a self-supervised model learns an in-painting task on MR images, following a pre-text task of randomly masking out small patches from the images. Using image quality ratings from three radiologists, we demonstrate that our proposed metric is a superior IQM to SSIM and PSNR.Methods
We used the fastMRI proton-density-weighted knee MRI scans with and without fat saturation in our study6. Reference fully-sampled images were computed from the fully-sampled k-space data using ESPIRiT7. For the SSL task we split the 2D fast spin echo scans into training, validation, and testing splits with 778, 195, and 199 3D scans respectively. For DL-based undersampled MR reconstruction, we used a subset of 108 patient for training, 30 patients for validation, and 61 patients withheld for testing and the reader study. The supervised DL-reconstruction models were trained to reconstruct 2x, 4x and 6x accelerated scans using both a U-Net model8 and an unrolled network9, each with complex inputs, 2 convolutions per layer and 4 levels with 32-256 quadratically increasing filters.
Image corruptions for the context prediction task were generated dynamically during training by placing masked zero-valued image patches of size 16x16 pixels over 25% of the image area via Poisson variable density (Fig. 1). A self-supervised UNet model (with 2 convolutions per level and 5 levels with 20-320 quadratically increasing filters) was trained to in-paint the masked patches. This model was then truncated after the ReLu activation in a given layer. The center slice from ground truth and reconstructed image pairs are separately passed through the truncated model, producing two feature space outputs. SSFD is the element-wise mean square error between these two feature space representations (Fig. 1).
To understand the robustness of IQMs compared to SSFD under known image perturbations, we explored the impact of pixel shifts, Gaussian blurring, and additive Gaussian noise on SSFD (9th layer) and SSIM (Fig. 2). Each perturbation was applied to the center slice for each of 199 scans in the test set until a mean SSIM of 0.3 was reached. Pixel shifts were applied by rolling pixels in the x-direction with tricubic interpolation from 0 to 0.53 mm (1.2 pixels). The standard deviation of the Gaussian kernel for blurring varied between 0 and 1.97 mm (4.5 pixels) and for additive Gaussian noise varied from 0 to 0.8.
Three radiologists rated the diagnostic quality of the center slice of 366 accelerated MR reconstructions from 61 patients, each reconstructed with the 6 models described above in a blinded manner. The radiologists scored the image reconstructions for aliasing artifacts and diagnostic quality of the cartilage and meniscus on the following 1-9 scale: 1- completely non-diagnostic, 3- severe corruptions, 5- diagnostically acceptable, 7- good quality, 9- perfect quality. The mean radiologist image quality score (RIQS) was compared against SSFD (7th layer), SSIM and PSNR, from which the squared Pearson correlation coefficient (R2) and Spearman rank order correlation coefficient (SROCC) were computed. SROCC is a measure of the monotonicity between variables, as the relationship between IQMs and subjective scores may be non-linear9. We further assessed the ability to use the metrics as a binary classifier on whether a scan is diagnostically acceptable (RIQS >= 5), assessed in terms of AUROC.Results & Discussion
The average SSFD and SSIM metrics under image perturbations (Fig. 2) showed that SSFD increased approximately linearly under Gaussian noise and blurring perturbations, while SSIM decreased approximately as a decaying exponential. SSFD was comparatively insensitive to pixel shifts, indicating that SSFD captures more global image quality features that are less sensitive to exact pixel-level correspondence than SSIM and PSNR.
SSFD achieved the highest correlation with mean RIQS from the three readers for both aliasing and cartilage/meniscus assessment in terms of both R2 and SROCC across the 366 images (Fig. 3). An example patient with all 6 reconstruction types and their corresponding IQMs are shown in Figure 4. SSFD also outperformed SSIM and PSNR in terms of AUROC when used as a binary classifier (Fig. 5).Conclusion
This work introduces the SSFD IQM based on MR domain-specific feature representations learned from a SSL task. We show that SSFD is more highly correlated to radiologist perceived diagnostic utility of sparse MR reconstructions than conventional metrics such as SSIM and PSNR. Acknowledgements
NIH R01 AR077604, R01 EB002524 and K24 AR062068
Radiological Sciences Laboratory Seed Grant from Stanford University
References
1. Allister Mason et al. “Comparison of objective image quality metrics to expert radiologists’ scoring of diagnostic quality of MR images”. In: IEEE transactions on medical imaging 39.4 (2019), pp. 1064–1072.
2. Akshay S Chaudhari et al. “Prospective deployment of deep learning in MRI: A framework for important considerations, challenges, and recommendations for best practices”. In: Journal of Magnetic Resonance Imaging (2020).
3. Florian Knoll et al. “Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge”. In: Magnetic resonance in medicine 84.6 (2020), pp. 3054–3070.
4. Deepak Pathak et al. “Context encoders: Feature learning by inpainting”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 2536–2544.
5. Krishna Chaitanya et al. “Contrastive learning of global and local features for medical image segmentation with limited annotations”. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. (2020).
6. Florian Knoll et al. “fastMRI: A Publicly Available Raw k-Space and DICOM Dataset of Knee Images for Accelerated MR Image Reconstruction Using Machine Learning.” eng. In: Radiol Artif Intell 2.1 (Jan. 2020), e190007. issn: 2638-6100 (Electronic); 2638-6100 (Linking). doi: 10.1148/ryai. 2020190007.
7. Martin Uecker et al. “ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA”. In: Magnetic resonance in medicine 71.3 (2014), pp. 990–1001.
8. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
9. Christopher M Sandino et al. “Compressed sensing: From research to clinical practice with deep neural networks: Shortening scan times for magnetic resonance imaging”. In: IEEE signal processing magazine 37.1 (2020), pp. 117– 127.
10. Hamid R Sheikh, Muhammad F Sabir, and Alan C Bovik. “A statistical evaluation of recent full reference image quality assessment algorithms”. In: IEEE Transactions on image processing 15.11 (2006), pp. 3440–3451.