Dilek M. Yalcinkaya1,2, Zhuoan Li1,3, Khalid Youssef4, Bobak Heydari5, Rohan Dharmakumar3,4, Robert Judd6, Orlando Simonetti7, Subha Raman4, and Behzad Sharif1,3,4
1Laboratory for Translational Imaging of Microcirculation, Indiana University School of Medicine (IUSM), Indianapolis, IN, United States, 2Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States, 3Biomedical Engineering, Purdue University, West Lafayette, IN, United States, 4Krannert Cardiovascular Research Center, IUSM, Indianapolis, IN, United States, 5Stephenson Cardiac Imaging Centre, University of Calgary, Calgary, AB, Canada, 6Intelerad, Raleigh, NC, United States, 7Davis Heart and Lung Research Institute, The Ohio State University, Columbus, OH, United States
Synopsis
Keywords: Analysis/Processing, Segmentation
Motivation: Accurate segmentation of free-breathing (FB) myocardial perfusion (MP) MRI is a labor-intensive yet necessary preprocessing step. A quality control (QC) tool for deep learning (DL)-based segmentation of FB MP MRI is lacking.
Goal(s): Developing a DL-based dynamic QC (dQC) tool for automatic analysis of MP MRI.
Approach: Using the discrepancy between patch-based segmentations, a dQC map is derived and quantified into a dQC metric. The utility of this metric in detecting erroneous segmentations is demonstrated by considering a human-in-the-loop (HiTL) framework.
Results: Referral of the dQC-detected timeframes to a HiTL has markedly improved the segmentation results when compared to a random referral approach.
Impact: We proposed a dynamic quality control tool for automatic segmentation and analysis of free-breathing myocardial perfusion MRI datasets. Our results show that the proposed approach has markedly improved segmentation accuracy when used within a practical and efficient clinician-in-the-loop setting.
Introduction
Free-breathing myocardial perfusion (MP) MRI protocols are preferred over breath-hold exams due to their applicability to a wider range of patient cohorts. However, in cases where non-rigid motion correction (MoCo) fails or is unavailable, manual segmentation of the acquired free-breathing timeframes (200-300 images per patient) can be extremely labor-intensive. Nevertheless, accurate segmentation, which can be performed using deep neural networks (DNNs)1-3, is a necessary preliminary analysis step for quantitative MP studies. In multi-center MRI studies, dataset shifts can occur partially due to limited labeled training data and variations in the scanner platform4 (mismatch in the scanner vendor/field strength and sequence parameters between training and test datasets), which may result in a previously trained DNN to fail to generalize to specific cases in an external test dataset. To address this, we propose an automatic quality control (QC) technique for identification of “low confidence” segmentations in a dynamic MP MRI image series which, in turn, enables a human-in-the-loop (HiTL) framework to improve the performance of A.I.-guided analysis4,5. Specifically, we: (i) propose a dynamic QC (dQC) tool for assessment of DNN-based analysis of free-breathing MP MRI; and (ii) show the utility of the proposed tool for improving the performance of a HiTL framework on an external dataset from the SCMR Registry.Methods
A sliding-patch approach3 was used to train a U-Net6 using a training dataset of 120 MP MRI studies from two centers. Data was motion corrected and extensively augmented by simulating various breathing motion patterns. To assess the performance in a multi-center setting, an external dataset of free-breathing 1.5T studies (n=20) from a third medical center was acquired using the SCMR Registry. Figure 1 shows the steps for the proposed dQC tool and its use within a HiTL framework. In the first part (upper half), the DNN is trained by decomposing the free-breathing MP MRI images into spatiotemporal patches. Given that each pixel is in multiple patches, we propose to further utilize this patch-based approach at test time by analyzing the discordance of the DNN-derived dynamic segmentations across overlapping patches. At pixel location $$$(x,y)$$$ and time $$$t$$$, time-varying dQC map takes the value:
$$M_{x,y}(t) = std(p_{x,y}^1(t), p_{x,y}^2(t), ..., p_{x,y}^{|\Gamma_{x,y}|}(t)) $$
where $$$\Gamma_{x,y}$$$ denotes the set of space-time patches that include a particular pixel location $$$(x,y)$$$. Further, $$$p_{x,y}^i(t)$$$ denotes the softmax probability of the trained DNN during inference for the $$$i$$$-th patch at time $$$t$$$ and spatial location $$$(x,y)$$$, and $$$std$$$ is the standard deviation operator. We also define the dQC metric $$$Q(t)$$$ as the ratio of the energy in the dQC map $$$M(t)$$$ to the number of pixels in the corresponding segmentation: $$$Q(t) = ||M(t)||_F/\sum_{x,y} S_{x,y}(t)$$$. In the second part of Fig 1 (bottom half), the dQC metric is used to select the segmentations with low confidence (poor quality) to refer to a human expert for correction (refinement of the endo/epi contours) in the scope of a HiTL-A.I. collaboration experiment. During refinement, the expert was instructed to correct two types of error: (i) anatomically invalid segmentation (e.g., discontiguous contours); (ii) inclusion of blood pool, epicardial fat, or regions outside of the ventricle in the segmented myocardium. Then, two approaches were compared for HiTL correction: (1) random referral of 10% of the timeframes; (2) dQC-guided referral of the top 10% most-uncertain timeframes.Results
Figure 2 compares the cumulative performance over the entire 2D+time image series for the two HiTL correction approaches in terms of (A) Dice score and (B) segmentation failure prevalence. The random-referral (naïve) approach resulted in a nearly unchanged Dice compared to baseline (from 0.767 ± 0.042 to 0.768 ± 0.042) and modest decrease in the overall number of failed segmentations (16.2% to 14.4%). In contrast, the proposed dQC-guided referral approach for HiTL correction resulted in a notable increase in the Dice (from 0.767 ± 0.042 to 0.781 ± 0.039) and a 30% reduction in the number of failed segmentations (from 16.2% to 11.3%). Figure 3 shows the results for two representative free-breathing image series from the external dataset along with proposed dQC map and metric. In both examples, the proposed dQC metric is able to track the segmentation errors by yielding a larger value.Conclusion
We have proposed a human-A.I. collaboration framework powered by a dynamic QC tool for DNN-based segmentation of free-breathing MP MRI datasets and provided preliminary evaluation of its utility in the setting of multi-center studies enabled by the SCMR Registry. With a limited “HiTL referral budget” of 10% of the total number of images, representing a practical clinical scenario, the proposed approach was able to markedly improve the segmentation performance.Acknowledgements
This work was supported by the NIH awards R01-HL153430 & R01-HL148788, and the Lilly Endowment INCITE award (PI: B. Sharif).References
1. Scannell CM, et al. Deep-Learning-Based Preprocessing for Quantitative Myocardial Perfusion MRI. JMRI 2020;51(6):1689-96.
2. Xue H, et al. Automated inline analysis of myocardial perfusion MRI with deep learning. Radiology: Artif. Intell. 2020;2(6):e200009.
3. Yalcinkaya DM, et al. Deep learning-based segmentation and uncertainty assessment for automated analysis of myocardial perfusion MRI datasets using patch-level training and advanced data augmentation. Proc of IEEE Eng. in Med & Biol (EMBC) 2021; pp. 4072-78. DOI: 10.1109/EMBC46164.2021.9629581
4. Rajpurkar P, et al. The current and future state of AI interpretation of medical images. N. Engl. J. Med. 2023;388(21):1981-90.
5. Mozannar H, et al. Consistent estimators for learning to defer to an expert. Int Conf on Mach Learn 2020. DOI: 10.48550/arXiv.2006.01862.
6. Ronneberger O, et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. Med Image Comput Comput Assist Interv (MICCAI) 2015. Part III (pp. 234-241). DOI: 10.1007/978-3-319-24574-4_28