0712

Image quality assessment and longitudinal quality monitoring of clinically-applied AI-based reconstructions in MRI of rectal cancer

Owen Alun White^1,2, Joshua Shur¹, Francesca Castagnoli^1,2, Geoff Charles-Edwards^1,2, Brandon Whitcher^1,2, Erica Scurr¹, Georgina Hopkinson¹, Dow-Mu Koh^1,2, and Jessica M Winfield^1,2
¹MRI Unit, The Royal Marsden NHS Foundation Trust, London, United Kingdom, ²Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, United Kingdom

Synopsis

Keywords: Cancer, Machine Learning/Artificial Intelligence, Image Quality; Quality Control / Quality Assurance; QA/QC; AI/ML image reconstruction

Motivation: With increasing AI adoption in MR-reconstructions, robust quality assessment becomes paramount. This study aims to ensure that AI-techniques meet clinical requirements at implementation and longitudinally.

Goal(s): 1) Compare image quality of AI-imaging with standard techniques in anorectal cancer. 2) Develop longitudinal quality control (QC) assessments capable of detecting changes in AI-reconstructions without resource-intensive evaluations.

Approach: A prospective study involving 40 patients utilised radiologist scoring and quantitative image-quality-metrics (IQMs). Retrospective reconstructions gauged sensitivity of IQMs to reconstruction pipeline changes.

Results: AI-reconstructions demonstrated >50% time savings with improved image quality. Feasibility of quantitative-IQMs for assessing AI-reconstructions is established, providing a practical solution for ongoing QC.

Impact: There is a need to develop QC assessments offering performance monitoring for AI-based reconstructions in diverse clinical settings. The study presents feasible ways to support integration of AI-imaging into clinical practice, including resource-efficient quantitative image quality assessments.

Introduction

AI-based MRI acceleration techniques can improve workflow efficiency and the patient experience by reducing acquisition time whilst maintaining image quality^1-3.

It is crucial for clinical institutions to ensure AI-based techniques provided by MRI manufacturers meet local clinical requirements⁴. Local assessments often employ one-off qualitative studies that compare image quality between AI-based and standard techniques. Although such studies provide initial insights into the technique’s performance, they are resource-intensive and consequently are not practical for subsequent monitoring of performance that may change over time, because of model drift or changes associated with measurement and software updates. Also, the sensitivity of AI-based reconstructions to hardware degradation is unknown.

It is therefore essential to develop methods that can assess changes in the AI-reconstruction, to ensure consistent image quality over time⁵. The aims of this study were (1) to compare the image quality and acquisition time of AI-based accelerated 2D turbo spin echo (TSE) imaging with standard TSE imaging of the rectum, in particular rectal wall layer definition required for accurate tumour staging. The study seeks to establish whether AI-based techniques can significantly reduce acquisition time by over 50% without compromising image quality. (2) To develop quantitative quality control (QC) assessments to enable longitudinal evaluation of AI-based reconstruction techniques to detect changes in performance, without the need for repeated resource-intensive radiologist assessments.

Methods

In this prospective study, informed consent was obtained from 40 patients with anorectal malignancy and with rectum in situ (i.e. not had surgical resection) (median age 63 years, range 37-85, 22M). AI-based reconstruction^3,6 for de-noising (DR-Boost) and super-resolution (DR-Sharp) (Deep Resolve, Siemens Healthcare, Erlangen, Germany) was applied in accelerated sagittal T2w-TSE and small-field-of-view axial T2w-TSE sequences. AI-sequences were acquired in addition to standard-of-care T2w-TSE imaging at matched slice positions. Patients were imaged on 1.5T MAGNETOM Sola (19/40) or 3T MAGNETOM Vida scanners (Siemens Healthcare).

Likert scoring assessment
Two radiologists conducted a blinded qualitative analysis using a four-point Likert scale to score four features on each dataset: signal-to-noise ratio (SNR), rectal wall sharpness, rectal wall layer conspicuity, and overall image quality. Differences between median scores for the AI- and standard-series were evaluated using a Wilcoxon-test with a significance level of p<0.05.

Quantitative image quality metrics (IQMs)
IQMs were measured in MATLAB (R2020b, MathWorks, Natick, MA)^7,8 to assess differences between the AI and standard axial images. Mean-squared error (MSE), peak signal-to-noise ratio (pSNR), structural similarity index (SSIM), textural features (entropy), and wavelet decompositions were calculated for a central 200x200-pixel ROI for every slice in the series. Sensitivity of the IQMs to changes in the reconstruction process was evaluated using five patient datasets retrospectively reconstructed (RR) with deliberate modifications (changes to AI-reconstruction mode denoising strength, and additional image smoothing filter). IQMs were compared using a Shewhart chart to establish a control limit⁹.

Results

AI-based reconstructions allowed for median time savings of 56% (2:21 min) and 69% (4:01 min) for the sagittal and axial series, respectively, without a measurable impact on image quality.

Qualitative scores show that the AI-series exhibited better image quality than the standard series (figure 3). There was no significant difference in the median cohort scores for SNR, or in the median scores of the 1.5T and 3T datasets for any feature.

Figure 4 contains Shewhart plots of MSE, pSNR, and SSIM, comparing AI-based and standard reconstructions for all patients acquired at 3T, and RR-datasets. The IQMs for the RR-datasets fell outside the control limit, showing that they are sensitive to changes in the reconstruction pipeline. Other IQMs did not show any difference between the original and RR-datasets.

Discussion

This study shows that AI-based reconstructions allowed for significant (50-70%) time savings to be achieved in a clinical setting in high-resolution T2-weighted rectal cancer imaging, with an overall improvement in image quality. Furthermore, it is possible to use quantitative IQMs (MSE, pSNR, SSIM) to detect changes in the reconstruction pipeline of AI-based techniques without resource intensive image scoring.

Other IQMs were insensitive to changes in the reconstruction, however an exhaustive assessment of textural features and wavelet transforms were not performed. Further work will explore additional IQMs as potential QC metrics for an on-going longitudinal evaluation of these imaging protocols in clinical practice and investigate correlations between IQMs and radiologist scores to determine clinical significance.

Conclusion

In conclusion, the study indicates that AI-based reconstruction techniques demonstrate similar or improved rectal wall layer definition and overall image quality, with a 50-70% reduction in acquisition times. Quantitative IQMs (MSE, pSNR, and SSIM) detect changes in the reconstruction pipeline, providing ongoing QC in clinical practice without a significant burden on resources.

Acknowledgements

This study represents independent research funded by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre at The Royal Marsden NHS Foundation Trust and The Institute of Cancer Research, London, and by the Royal Marsden Cancer Charity. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

References

1. A. Hosny, C. Parmar, J. Quackenbush, H. J. W. L. Aerts and L. H. Schwartz, "Artificial intelligence in radiology," Nature Reviews Cancer, vol. 18, no. 8, pp. 500-510, 2018.

2. J. H. Thrall, Q. Li, X. Li, C. Cruz, S. Do, K. Dreyer and J. Brink, “Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success,” Journal of the American College of Radiology, vol. 15, pp. 504-508, 2017.

3. K. Hammernik, F. Knoll and D. Rueckert, "Deep Learning for Parallel MRI Reconstruction: Overview, Challenges, and Opportunities," MAGNETOM Flash, vol. 4, pp. 10-15, 2019.

4. D. Daye, W. F. Wiggins, M. P. Lungren, T. Alkasab, N. Kottler, B. Allen, C. J. Roth, B. C. Bizzo, K. Durniak, J. A. Brink and C. P. Langlotz, "Implementation of clinical artificial intelligence in radiology: who decides and how?," Radiology, vol. 305, no. 3, pp. 555-563, 2022.

5. D. B. Larson and G. B. Boland, “Imaging Quality Control in the Era of Artificial Intelligence,” Journal of the American College of Radiology, vol. 16, pp. 1259-1266, 2019.

6. K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock and F. Knoll, "Learning a variational network for reconstruction of accelerated MRI data," Magnetic resonance in medicine, vol. 79, no. 6, pp. 3055-3071, 2018.

7. The Mathworks Inc., Image Processing Toolbox, MATLAB 9.9.0.2037887 (R2020b) Update 8, Natick, Massachusetts: The MathWorks Inc., 2020.

8. The Mathworks Inc., Wavelet Toolbox, MATLAB 9.9.02037887 (R2020b) Update 8., Natick, Massachusetts: The MathWorks Inc., 2020.

9. A. Simmons, E. Moore and S. C. Williams, "Quality control for functional magnetic resonance imaging using automated data analysis and Shewhart charting," Magnetic Resonance in Medicine, vol. 41, no. 6, pp. 1274-1278, 1999.

Figures

Figure 1: Example T2-weighted TSE images with (left) and without (right) AI-reconstruction techniques for a 70-year-old male patient with rectal adenocarcinoma. This sagittal series contained 40 slices, while this axial series contain 24 slices. Acquisition times (min) and reconstructed voxel sizes (mm) are indicated beneath each image. In this example, overall time savings of 65% can be achieved without a measurable loss in image quality.

Figure 2: Acquisition parameters for the T2-weighted TSE acquisitions. Imaging protocols were based on the routine clinical protocol at our institution. Times shown are for a single imaging series, but it should be noted that a typical clinical rectum MRI exam contains 2-4 axial series depending on the clinical indication. Ranges indicate parameters that were altered to ensure appropriate coverage of the patient anatomy, or to meet SAR safety limits, as dictated by clinical requirements. Images were acquired using a 30-channel anterior array and 32-channel posterior spine array.

Figure 3: Summary table and box plots showing the qualitative Likert image scoring results for n = 40 patients. A Wilcoxon-test was used to determine if there was a difference between the median scores of the T2-weighted TSE images acquired with and without AI-based reconstruction techniques. The series with the highest score is shown in bold when a difference in the median scores was below the 5% significance level.

Figure 4: Box plots showing quantitative image quality metrics (IQMs) comparing standard vs AI-based reconstructions. The left-hand plots summarise the IQMs calculated on originally reconstructed axial images for all the patients from the 3T dataset (n=21). The RR plots show the IQMs calculated for 5/21 retrospectively reconstructed datasets, which had deliberately induced changes to the AI-based reconstruction, to determine if the IQM could detect changes in the reconstruction pipeline. RR 1) change to the AI-reconstruction denoising strength. RR 2) added smoothing filter.

Figure 5: Images showing the effect of deliberately induced changes to the AI-based reconstruction pipeline on the axial T2-weighted TSE images. RR 1) change to the AI-reconstruction mode denoising strength from “medium” to “high”. RR 2) added image smoothing filter. The deliberate changes applied here reflect potential changes in the AI-based reconstruction that could occur, for example after a software upgrade, which a user would need to be able to detect to ensure consistent image quality is maintained.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

0712

DOI: https://doi.org/10.58530/2024/0712