3019

Assessing Image Quality Metric Alignment with Radiological Evaluation in Datasets with and without Motion Artifacts

Elisa Marchetto^1,2,3, Hannah Eichhorn^4,5, Daniel Gallichan³, Stefan T. Schwarz^6,7,8, Nitesh Shekhrajka⁹, and Melanie Ganz^10,11
¹Bernard and Irene Schwartz Center for Biomedical Imaging, Department of Radiology, New York University Grossman School of Medicine, New York, NY, United States, ²Center for Advanced Imaging Innovation and Research (CAI2R), Department of Radiology, New York University Grossman School of Medicine, New York, NY, United States, ³CUBRIC, School of Engineering, Cardiff University, Cardiff, United Kingdom, ⁴Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Munich, Germany, ⁵School of Computation, Information and Technology, Technical University of Munich, Munich, Germany, ⁶University Hospitals of Wales, Department of Radiology, Cardiff, United Kingdom, ⁷CUBRIC, School of Psychology, Cardiff University, Cardiff, United Kingdom, ⁸University of Nottingham, School of Medicine, Nottingham, United Kingdom, ⁹University of Iowa hospitals and Clinics, Iowa City, IA, United States, ¹⁰Department of Computer science, University of Copenhagen, Copenhagen, Denmark, ¹¹Neurobiology Research Unit, Copenhagen University Hospital, Copenhagen, Denmark

Synopsis

Keywords: Data Processing, Data Processing

Motivation: A quantitative evaluation of image quality is crucial in various aspects of MRI, such as developing and validating new image reconstruction and artifact correction techniques. Currently, no image quality metric covers all possible artifacts, making it difficult to choose the right quality measure.

Goal(s): Evaluate consistency and reliability of image quality metrics in relation to image pre-processing and radiologists assessment.

Approach: We studied the correlation of ten commonly used quality metrics with radiological evaluations in datasets with and without motion.

Results: SSIM and PSNR had the strongest correlation with observer scores. Among reference-free metrics, Image Entropy and AES consistently showed strong correlations.

Impact: Automatically evaluating the quality of MR images is crucial. Our results show variability in the correlation between image-quality metrics and radiologists scores across datasets, highlighting the need for preprocessing optimization especially when no reference image is available.

Introduction

Assessing magnetic resonance (MR) image quality is vital for various applications, like improving image reconstruction and artifact correction methods. In the field of motion correction (MoCo), different image quality metrics are used, some using reference images. However, none of these metrics capture all possible image artifacts, making method comparisons challenging^1,2. To understand the clinical relevance of these metrics, we evaluated their correlation with radiological quality assessment on two different datasets from two different sites.

Methods

We included 3 reference-based image quality metrics (SSIM³, PSNR⁴ and FSIM^5,6) and 7 non-reference based metrics (Tenengrad^7,8, Image Entropy⁹, AES^10,11, NGS^6,9, two implementations of Gradient Entropy^9,12 and Co-occurence Entropy¹⁰, refer to Figure 1 for details). Data was acquired with two Prisma scanners (Siemens Healthineers, Erlangen, Germany) at two different research institutes: NRU¹³ (Copenhagen, Denmark) and CUBRIC (Cardiff, UK). Dataset 1 and dataset 2 are summarized in Figure 2.
The pre-processing steps were harmonized between the two datasets, and consisted of applying skull-stripping to the reference MPRAGE images using BET¹⁴ to generate a brain mask. All other images were aligned with the respective reference using FLIRT¹⁵ prior to the multiplication with the brain mask. All references have been acquired without voluntary motion and no motion correction was applied.
The brain-masked 3D volume was normalized by subtracting the mean and dividing by the standard deviation. As the Co-occurrence Entropy and the FSIM metrics require voxel values between 0-255, the images were rescaled to this range prior to the estimation of those two metrics. A summary of the pre-processing steps is displayed in Figure 3A.
Image quality scoring was carried out by two experienced radiologists and two recently graduated radiographers for dataset 1 and by one radiologist for dataset 2. The evaluation was performed using a 1-5 Likert Scale⁵ as shown in Figure 3B.
The correlation between the image quality metrics and the observers score was performed using the Spearman correlation coefficient¹⁶ (corrcoef in Matlab).
The intra-variability between evaluators was calculated using the Krippendorff's alpha coefficient¹⁷ on dataset 1, with double weighting of the radiologists’ score. A value of 1 represents perfect agreement among the evaluators.

Results

Statistically significant correlation coefficients between image quality metrics and observer scores (alpha = 0.65) are shown in Figure 3 (for the MPRAGE sequence, both dataset 1 and 2) and Figure 4 (other sequences, dataset 1) with and without MoCo.
The reference-based metrics, SSIM, PSNR and FSIM, outperform the non-reference metrics in terms of stability across datasets and sequence types as well as overall correlation strength. Among the non-reference metrics, Image Entropy and AES show consistent correlation with evaluators across datasets and sequences.

Discussion

Our findings align with prior research^18,2,6. To enhance data variability, we included dataset 2 in our comparison, standardizing the pre-processing methods to ensure consistent evaluations. As image quality metrics can be influenced by the nature and intensity of motion artifacts¹⁸, pre-processing assumes a pivotal role in achieving reliable and reproducible results¹⁹. The two datasets exhibit similar results, although inconsistencies are noticeable for the non-reference metrics, particularly Tenengrad, Gradient Entropy, and Co-Occurence Entropy. Some of the variations may be attributed to the absence of intra-variability within dataset 2, as the image evaluations were performed by a single radiologist. However, even only within dataset 1, these metrics behave inconsistently. Notably, among the non-reference metrics, Image Entropy and AES showed consistent correlations with radiological evaluation across datasets and sequences. This finding is consistent with previous studies that incorporated the AES metric^20,21.
We observed a strong dependency on the preprocessing for non-reference metrics. Exploring normalization and brain-masking techniques as part of this pre-processing optimization is crucial and will be part of future work. Additionally, machine learning offers a promising avenue for non-reference image quality assessment, e.g. enabling automated detection of motion artifacts in MRI²², and will be a matter of future studies.

Acknowledgements

This work was performed under the rubric of the Center for Advanced Imaging Innovation and Research (CAI2R, www.cai2r.net), an NIBIB National Center for Biomedical Imaging and Bioengineering (NIH P41 EB017183).

References

Spieker V., Eichhorn H., Hammernick K. et al. Deep Learning for Retrospective Motion Correction in MRI: A Comprehensive Review. IEEE Trans Med Imaging. 2023.
Eichhorn H., Chemnitz-Thomsen S., Vouros E., et al. Evaluating the match of image quality metrics with radiological assessment in a dataset with and without motion artifacts. ISMRM Annual Meeting Proceedings. 2022
Z. Wang, A. C. Bovik, H. R. Sheikh et al. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE. 2004; 13(4), 600-612
A. Horé and D. Ziou. Image Quality Metrics: PSNR vs. SSIM. 20th International Conference on Pattern Recognition. 2010; 2366-2369
L. Zhang, L. Zhang, X. Mou and D. Zhang. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE. 2011; 20(8), 2378-2386.
Marchetto E., Murphy K. Glimberg SL, et al. Robust retrospective motion correction of head motion using navigator-based and markerless motion tracking techniques. Magn Reson Med. 2023; 90(4): 1297-1315.
Kecskemeti S, Samsonov A, Velikina J et al. Robust motion correction strategy for structural MRI in unseated children demonstrated with three-dimensional radial MPnRAGE. Radiology. 2018; 289:509-516.
Krotkov E. Focusing. Int J Comput Vis. 1988; 1(3):223-237.
McGee K, Manduca A, Felmlee J et al. Image metric-based correction (autocorrection) of motion effects: analysis of image metrics. J Magn Reson Imaging. 2000; 11(2):174-181.
Pannetier N, Stavrinos T, Ng P et al. Quantitative framework for prospective motion correction evaluation. Magn Reson Med. 2016; 75(2):810-816.
Zacà D, Hasson U, Minati L, Jovicich J. Method for retrospective estimation of natural head movement during structural MRI. J Magn Reson Imaging. 2018 Oct;48(4):927-937.
Loktyushin, A., Nickisch, H., Pohmann, R. et al. Blind retrospective motion correction of MR images. Magn. Reson. Med. 2013; 70: 1608-1618.
Ganz M., Eichhorn H. Datasets with and without deliberate head movements for evaluating the performance of markerless prospective motion correction and selective reacquisition in a general clinical protocol for brain MRI. OpenNeuro. Accessed 2023.
S.M. Smith. Fast robust automated brain extraction. Human Brain Mapping. 2002; 17(3):143-155.
Jenkinson, M., Bannister, P., Brady, J. M. et al. Improved Optimisation for the Robust and Accurate Linear Registration and Motion Correction of Brain Images. NeuroImage. 2002; 17(2), 825-841.
Spearman C. The Proof and Measurement of Association between Two Things. Am J Psychol. 1904; 15(1):72–101.
Krippendorff K. Content Analysis: An Introduction to Its Methodology. 2013; 3rd Ed., 221–250.
Mason A., Rioux J., Clarke SE., et al. Comparison of Objective Image Quality Metrics to Expert Radiologists' Scoring of Diagnostic Quality of MR Images. IEEE. 2020;39(4):1064-1072.
Churchill NW, Oder A, Abdi H, Tam F, Lee W, Thomas C, Ween JE, Graham SJ, Strother SC. Optimizing preprocessing and analysis pipelines for single-subject fMRI. I. Standard temporal motion and physiological noise correction methods. Hum Brain Mapp. 2012 Mar;33(3):609-27. doi: 10.1002/hbm.21238. Epub 2011 Mar 31. PMID: 21455942; PMCID: PMC4898950.
van Niekerk, A., van der Kouwe, A., Meintjes, E. Toward “plug and play” prospective motion correction for MRI by combining observations of the time varying gradient and static vector fields. Magn Reson Med. 2019; 82: 1214–1228.
Laustsen, M., Andersen, M., Xue, R., et al. Tracking of rigid head motion during MRI using an EEG system. Magn Reson Med. 2022; 88: 986-1001.
Küstner T, Liebgott A, Mauch L, et al. Automated reference-free detection of motion artifacts in magnetic resonance images. MAGMA. 2018;31(2):243-256.

Figures

Figure 1. Summary of the image quality metrics adopted in this study. The metrics were calculated on the images after applying brain-masking and normalization.

Figure 2. Data acquisition details for datasets 1 and 2. The two datasets were acquired at two different research centers: dataset 1 at NRU (Copenhagen, Denmark) and dataset 2 at CUBRIC (Cardiff, UK).

Figure 3. (A) Images were co-registered to the respective reference image (no voluntary motion - no motion correction) using FLIRT. A brain mask was extracted from the reference MPRAGE volume using BET and applied to the FLIRT-registered images. After applying normalization, the image quality metrics were estimated on masked volumes. (B) Likert-scale of the image quality scores definition adopted for the evaluation performed by radiologists and radiographers.

Figure 4. Reference-based metrics show strong correlation with the evaluators’ scores in both datasets 1 and 2. Where no value is displayed, no significant correlation was found (p>0.05). The results for non-reference-based metrics are more nuanced, with only Image Entropy and AES showing consistent results and correlation with the scores.

Figure 5. Reference based metrics show high correlation with evaluators scores also at the different contrasts acquired in dataset 1. Results between motion correction ON/OFF are comparable, showing reliability of the reference based metrics. In case of non-reference based metrics, image entropy and AES also show high correlation with observers’ scores. The correlation scores for the remaining non-reference based metrics are more variable, urging the need for caution in the pre-processing, here hypothesized as the main factor for nuanced results.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

3019

DOI: https://doi.org/10.58530/2024/3019