1206

Efficient Analysis of Myocardial Perfusion MRI with Human-in-the-loop Dynamic Quality Control: Initial Results Using the SCMR Registry
Dilek M. Yalcinkaya1,2, Zhuoan Li1,3, Khalid Youssef4, Bobak Heydari5, Rohan Dharmakumar3,4, Robert Judd6, Orlando Simonetti7, Subha Raman4, and Behzad Sharif1,3,4
1Laboratory for Translational Imaging of Microcirculation, Indiana University School of Medicine (IUSM), Indianapolis, IN, United States, 2Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States, 3Biomedical Engineering, Purdue University, West Lafayette, IN, United States, 4Krannert Cardiovascular Research Center, IUSM, Indianapolis, IN, United States, 5Stephenson Cardiac Imaging Centre, University of Calgary, Calgary, AB, Canada, 6Intelerad, Raleigh, NC, United States, 7Davis Heart and Lung Research Institute, The Ohio State University, Columbus, OH, United States

Synopsis

Keywords: Analysis/Processing, Segmentation

Motivation: Accurate segmentation of free-breathing (FB) myocardial perfusion (MP) MRI is a labor-intensive yet necessary preprocessing step. A quality control (QC) tool for deep learning (DL)-based segmentation of FB MP MRI is lacking.

Goal(s): Developing a DL-based dynamic QC (dQC) tool for automatic analysis of MP MRI.

Approach: Using the discrepancy between patch-based segmentations, a dQC map is derived and quantified into a dQC metric. The utility of this metric in detecting erroneous segmentations is demonstrated by considering a human-in-the-loop (HiTL) framework.

Results: Referral of the dQC-detected timeframes to a HiTL has markedly improved the segmentation results when compared to a random referral approach.

Impact: We proposed a dynamic quality control tool for automatic segmentation and analysis of free-breathing myocardial perfusion MRI datasets. Our results show that the proposed approach has markedly improved segmentation accuracy when used within a practical and efficient clinician-in-the-loop setting.

Introduction

Free-breathing myocardial perfusion (MP) MRI protocols are preferred over breath-hold exams due to their applicability to a wider range of patient cohorts. However, in cases where non-rigid motion correction (MoCo) fails or is unavailable, manual segmentation of the acquired free-breathing timeframes (200-300 images per patient) can be extremely labor-intensive. Nevertheless, accurate segmentation, which can be performed using deep neural networks (DNNs)1-3, is a necessary preliminary analysis step for quantitative MP studies. In multi-center MRI studies, dataset shifts can occur partially due to limited labeled training data and variations in the scanner platform4 (mismatch in the scanner vendor/field strength and sequence parameters between training and test datasets), which may result in a previously trained DNN to fail to generalize to specific cases in an external test dataset. To address this, we propose an automatic quality control (QC) technique for identification of “low confidence” segmentations in a dynamic MP MRI image series which, in turn, enables a human-in-the-loop (HiTL) framework to improve the performance of A.I.-guided analysis4,5. Specifically, we: (i) propose a dynamic QC (dQC) tool for assessment of DNN-based analysis of free-breathing MP MRI; and (ii) show the utility of the proposed tool for improving the performance of a HiTL framework on an external dataset from the SCMR Registry.

Methods

A sliding-patch approach3 was used to train a U-Net6 using a training dataset of 120 MP MRI studies from two centers. Data was motion corrected and extensively augmented by simulating various breathing motion patterns. To assess the performance in a multi-center setting, an external dataset of free-breathing 1.5T studies (n=20) from a third medical center was acquired using the SCMR Registry. Figure 1 shows the steps for the proposed dQC tool and its use within a HiTL framework. In the first part (upper half), the DNN is trained by decomposing the free-breathing MP MRI images into spatiotemporal patches. Given that each pixel is in multiple patches, we propose to further utilize this patch-based approach at test time by analyzing the discordance of the DNN-derived dynamic segmentations across overlapping patches. At pixel location $$$(x,y)$$$ and time $$$t$$$, time-varying dQC map takes the value:
$$M_{x,y}(t) = std(p_{x,y}^1(t), p_{x,y}^2(t), ..., p_{x,y}^{|\Gamma_{x,y}|}(t)) $$
where $$$\Gamma_{x,y}$$$ denotes the set of space-time patches that include a particular pixel location $$$(x,y)$$$. Further, $$$p_{x,y}^i(t)$$$ denotes the softmax probability of the trained DNN during inference for the $$$i$$$-th patch at time $$$t$$$ and spatial location $$$(x,y)$$$, and $$$std$$$ is the standard deviation operator. We also define the dQC metric $$$Q(t)$$$ as the ratio of the energy in the dQC map $$$M(t)$$$ to the number of pixels in the corresponding segmentation: $$$Q(t) = ||M(t)||_F/\sum_{x,y} S_{x,y}(t)$$$. In the second part of Fig 1 (bottom half), the dQC metric is used to select the segmentations with low confidence (poor quality) to refer to a human expert for correction (refinement of the endo/epi contours) in the scope of a HiTL-A.I. collaboration experiment. During refinement, the expert was instructed to correct two types of error: (i) anatomically invalid segmentation (e.g., discontiguous contours); (ii) inclusion of blood pool, epicardial fat, or regions outside of the ventricle in the segmented myocardium. Then, two approaches were compared for HiTL correction: (1) random referral of 10% of the timeframes; (2) dQC-guided referral of the top 10% most-uncertain timeframes.

Results

Figure 2 compares the cumulative performance over the entire 2D+time image series for the two HiTL correction approaches in terms of (A) Dice score and (B) segmentation failure prevalence. The random-referral (naïve) approach resulted in a nearly unchanged Dice compared to baseline (from 0.767 ± 0.042 to 0.768 ± 0.042) and modest decrease in the overall number of failed segmentations (16.2% to 14.4%). In contrast, the proposed dQC-guided referral approach for HiTL correction resulted in a notable increase in the Dice (from 0.767 ± 0.042 to 0.781 ± 0.039) and a 30% reduction in the number of failed segmentations (from 16.2% to 11.3%). Figure 3 shows the results for two representative free-breathing image series from the external dataset along with proposed dQC map and metric. In both examples, the proposed dQC metric is able to track the segmentation errors by yielding a larger value.

Conclusion

We have proposed a human-A.I. collaboration framework powered by a dynamic QC tool for DNN-based segmentation of free-breathing MP MRI datasets and provided preliminary evaluation of its utility in the setting of multi-center studies enabled by the SCMR Registry. With a limited “HiTL referral budget” of 10% of the total number of images, representing a practical clinical scenario, the proposed approach was able to markedly improve the segmentation performance.

Acknowledgements

This work was supported by the NIH awards R01-HL153430 & R01-HL148788, and the Lilly Endowment INCITE award (PI: B. Sharif).

References

1. Scannell CM, et al. Deep-Learning-Based Preprocessing for Quantitative Myocardial Perfusion MRI. JMRI 2020;51(6):1689-96.

2. Xue H, et al. Automated inline analysis of myocardial perfusion MRI with deep learning. Radiology: Artif. Intell. 2020;2(6):e200009.

3. Yalcinkaya DM, et al. Deep learning-based segmentation and uncertainty assessment for automated analysis of myocardial perfusion MRI datasets using patch-level training and advanced data augmentation. Proc of IEEE Eng. in Med & Biol (EMBC) 2021; pp. 4072-78. DOI: 10.1109/EMBC46164.2021.9629581

4. Rajpurkar P, et al. The current and future state of AI interpretation of medical images. N. Engl. J. Med. 2023;388(21):1981-90.

5. Mozannar H, et al. Consistent estimators for learning to defer to an expert. Int Conf on Mach Learn 2020. DOI: 10.48550/arXiv.2006.01862.

6. Ronneberger O, et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. Med Image Comput Comput Assist Interv (MICCAI) 2015. Part III (pp. 234-241). DOI: 10.1007/978-3-319-24574-4_28

Figures

Figure 1. Description of the proposed dynamic quality control (dQC) approach for free-breathing myocardial perfusion MRI datasets and the designed human-A.I. collaboration setup. (Top) dQC map is obtained from the discordance of DNN-derived segmentations across overlapping 2D+time patches. dQC metric is derived as the ratio of the energy in to the area of the corresponding segmentation . (Bottom) Low-confidence contours detected by the dQC metric are referred to the human expert.

Figure 2. Segmentation performance comparison between the proposed dQC-guided referral approach vs. naïve random referral for human-in-the-loop (HiTL) correction in terms of (A): Dice score and (B): segmentation failure prevalence. Unlike random referral, guidance by dQC resulted in a notable improvement in Dice and a 30% reduction in the number of failed segmentations. To calculate an accurate measure of expected performance for the random-referral approach, a total of 100 Monte Carlo runs were carried out.

Figure 3. Two representative free-breathing dynamic myocardial perfusion MRI image series from the external dataset along with segmentation results S(t), dQC maps M(t), and the variation of the dQC metric Q(t) over the duration of the dynamic scan. (A): The maximum of was observed at t=22, coinciding with the failed segmentation result (yellow arrow) in this image series. (B): Segmentation errors in the first 6 timeframes (yellow arrows) and around t=16 are accurately reflected by the metric.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)
1206
DOI: https://doi.org/10.58530/2024/1206