Elisa Marchetto1,2,3, Hannah Eichhorn4,5, Daniel Gallichan3, Stefan T. Schwarz6,7,8, Nitesh Shekhrajka9, and Melanie Ganz10,11
1Bernard and Irene Schwartz Center for Biomedical Imaging, Department of Radiology, New York University Grossman School of Medicine, New York, NY, United States, 2Center for Advanced Imaging Innovation and Research (CAI2R), Department of Radiology, New York University Grossman School of Medicine, New York, NY, United States, 3CUBRIC, School of Engineering, Cardiff University, Cardiff, United Kingdom, 4Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Munich, Germany, 5School of Computation, Information and Technology, Technical University of Munich, Munich, Germany, 6University Hospitals of Wales, Department of Radiology, Cardiff, United Kingdom, 7CUBRIC, School of Psychology, Cardiff University, Cardiff, United Kingdom, 8University of Nottingham, School of Medicine, Nottingham, United Kingdom, 9University of Iowa hospitals and Clinics, Iowa City, IA, United States, 10Department of Computer science, University of Copenhagen, Copenhagen, Denmark, 11Neurobiology Research Unit, Copenhagen University Hospital, Copenhagen, Denmark
Synopsis
Keywords: Data Processing, Data Processing
Motivation: A quantitative evaluation of image quality is crucial in various aspects of MRI, such as developing and validating new image reconstruction and artifact correction techniques. Currently, no image quality metric covers all possible artifacts, making it difficult to choose the right quality measure.
Goal(s): Evaluate consistency and reliability of image quality metrics in relation to image pre-processing and radiologists assessment.
Approach: We studied the correlation of ten commonly used quality metrics with radiological evaluations in datasets with and without motion.
Results: SSIM and PSNR had the strongest correlation with observer scores. Among reference-free metrics, Image Entropy and AES consistently showed strong correlations.
Impact: Automatically evaluating the quality of MR images is crucial. Our results show variability in the correlation between image-quality metrics and radiologists scores across datasets, highlighting the need for preprocessing optimization especially when no reference image is available.
Introduction
Assessing magnetic resonance (MR) image quality is vital for various applications, like improving image reconstruction and artifact correction methods. In the field of motion correction (MoCo), different image quality metrics are used, some using reference images. However, none of these metrics capture all possible image artifacts, making method comparisons challenging1,2. To understand the clinical relevance of these metrics, we evaluated their correlation with radiological quality assessment on two different datasets from two different sites.Methods
We included 3 reference-based image quality metrics (SSIM3, PSNR4 and FSIM5,6) and 7 non-reference based metrics (Tenengrad7,8, Image Entropy9, AES10,11, NGS6,9, two implementations of Gradient Entropy9,12 and Co-occurence Entropy10, refer to Figure 1 for details). Data was acquired with two Prisma scanners (Siemens Healthineers, Erlangen, Germany) at two different research institutes: NRU13 (Copenhagen, Denmark) and CUBRIC (Cardiff, UK). Dataset 1 and dataset 2 are summarized in Figure 2.
The pre-processing steps were harmonized between the two datasets, and consisted of applying skull-stripping to the reference MPRAGE images using BET14 to generate a brain mask. All other images were aligned with the respective reference using FLIRT15 prior to the multiplication with the brain mask. All references have been acquired without voluntary motion and no motion correction was applied.
The brain-masked 3D volume was normalized by subtracting the mean and dividing by the standard deviation. As the Co-occurrence Entropy and the FSIM metrics require voxel values between 0-255, the images were rescaled to this range prior to the estimation of those two metrics. A summary of the pre-processing steps is displayed in Figure 3A.
Image quality scoring was carried out by two experienced radiologists and two recently graduated radiographers for dataset 1 and by one radiologist for dataset 2. The evaluation was performed using a 1-5 Likert Scale5 as shown in Figure 3B.
The correlation between the image quality metrics and the observers score was performed using the Spearman correlation coefficient16 (corrcoef in Matlab).
The intra-variability between evaluators was calculated using the Krippendorff's alpha coefficient17 on dataset 1, with double weighting of the radiologists’ score. A value of 1 represents perfect agreement among the evaluators.Results
Statistically significant correlation coefficients between image quality metrics and observer scores (alpha = 0.65) are shown in Figure 3 (for the MPRAGE sequence, both dataset 1 and 2) and Figure 4 (other sequences, dataset 1) with and without MoCo.
The reference-based metrics, SSIM, PSNR and FSIM, outperform the non-reference metrics in terms of stability across datasets and sequence types as well as overall correlation strength. Among the non-reference metrics, Image Entropy and AES show consistent correlation with evaluators across datasets and sequences.Discussion
Our findings align with prior research18,2,6. To enhance data variability, we included dataset 2 in our comparison, standardizing the pre-processing methods to ensure consistent evaluations. As image quality metrics can be influenced by the nature and intensity of motion artifacts18, pre-processing assumes a pivotal role in achieving reliable and reproducible results19. The two datasets exhibit similar results, although inconsistencies are noticeable for the non-reference metrics, particularly Tenengrad, Gradient Entropy, and Co-Occurence Entropy. Some of the variations may be attributed to the absence of intra-variability within dataset 2, as the image evaluations were performed by a single radiologist. However, even only within dataset 1, these metrics behave inconsistently. Notably, among the non-reference metrics, Image Entropy and AES showed consistent correlations with radiological evaluation across datasets and sequences. This finding is consistent with previous studies that incorporated the AES metric20,21.
We observed a strong dependency on the preprocessing for non-reference metrics. Exploring normalization and brain-masking techniques as part of this pre-processing optimization is crucial and will be part of future work. Additionally, machine learning offers a promising avenue for non-reference image quality assessment, e.g. enabling automated detection of motion artifacts in MRI22, and will be a matter of future studies.Acknowledgements
This work was performed under the rubric of the Center for Advanced Imaging Innovation and Research (CAI2R, www.cai2r.net), an NIBIB National Center for Biomedical Imaging and Bioengineering (NIH P41 EB017183).References
Spieker V., Eichhorn H., Hammernick K. et al. Deep Learning for Retrospective Motion Correction in MRI: A Comprehensive Review. IEEE Trans Med Imaging. 2023.
Eichhorn H., Chemnitz-Thomsen S., Vouros E., et al. Evaluating the match of image quality metrics with radiological assessment in a dataset with and without motion artifacts. ISMRM Annual Meeting Proceedings. 2022
Z. Wang, A. C. Bovik, H. R. Sheikh et al. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE. 2004; 13(4), 600-612
A. Horé and D. Ziou. Image Quality Metrics: PSNR vs. SSIM. 20th International Conference on Pattern Recognition. 2010; 2366-2369
L. Zhang, L. Zhang, X. Mou and D. Zhang. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE. 2011; 20(8), 2378-2386.
Marchetto E., Murphy K. Glimberg SL, et al. Robust retrospective motion correction of head motion using navigator-based and markerless motion tracking techniques. Magn Reson Med. 2023; 90(4): 1297-1315.
Kecskemeti S, Samsonov A, Velikina J et al. Robust motion correction strategy for structural MRI in unseated children demonstrated with three-dimensional radial MPnRAGE. Radiology. 2018; 289:509-516.
Krotkov E. Focusing. Int J Comput Vis. 1988; 1(3):223-237.
McGee K, Manduca A, Felmlee J et al. Image metric-based correction (autocorrection) of motion effects: analysis of image metrics. J Magn Reson Imaging. 2000; 11(2):174-181.
Pannetier N, Stavrinos T, Ng P et al. Quantitative framework for prospective motion correction evaluation. Magn Reson Med. 2016; 75(2):810-816.
Zacà D, Hasson U, Minati L, Jovicich J. Method for retrospective estimation of natural head movement during structural MRI. J Magn Reson Imaging. 2018 Oct;48(4):927-937.
Loktyushin, A., Nickisch, H., Pohmann, R. et al. Blind retrospective motion correction of MR images. Magn. Reson. Med. 2013; 70: 1608-1618.
Ganz M., Eichhorn H. Datasets with and without deliberate head movements for evaluating the performance of markerless prospective motion correction and selective reacquisition in a general clinical protocol for brain MRI. OpenNeuro. Accessed 2023.
S.M. Smith. Fast robust automated brain extraction. Human Brain Mapping. 2002; 17(3):143-155.
Jenkinson, M., Bannister, P., Brady, J. M. et al. Improved Optimisation for the Robust and Accurate Linear Registration and Motion Correction of Brain Images. NeuroImage. 2002; 17(2), 825-841.
Spearman C. The Proof and Measurement of Association between Two Things. Am J Psychol. 1904; 15(1):72–101.
Krippendorff K. Content Analysis: An Introduction to Its Methodology. 2013; 3rd Ed., 221–250.
Mason A., Rioux J., Clarke SE., et al. Comparison of Objective Image Quality Metrics to Expert Radiologists' Scoring of Diagnostic Quality of MR Images. IEEE. 2020;39(4):1064-1072.
Churchill NW, Oder A, Abdi H, Tam F, Lee W, Thomas C, Ween JE, Graham SJ, Strother SC. Optimizing preprocessing and analysis pipelines for single-subject fMRI. I. Standard temporal motion and physiological noise correction methods. Hum Brain Mapp. 2012 Mar;33(3):609-27. doi: 10.1002/hbm.21238. Epub 2011 Mar 31. PMID: 21455942; PMCID: PMC4898950.
van Niekerk, A., van der Kouwe, A., Meintjes, E. Toward “plug and play” prospective motion correction for MRI by combining observations of the time varying gradient and static vector fields. Magn Reson Med. 2019; 82: 1214–1228.
Laustsen, M., Andersen, M., Xue, R., et al. Tracking of rigid head motion during MRI using an EEG system. Magn Reson Med. 2022; 88: 986-1001.
Küstner T, Liebgott A, Mauch L, et al. Automated reference-free detection of motion artifacts in magnetic resonance images. MAGMA. 2018;31(2):243-256.