1230

Rapid Automatic Quantification of Myocardial Blood Flow in Free-breathing Myocardial Perfusion MRI without the Need for Motion Correction: A Novel Spatio-temporal Deep Learning Approach
Zulma Sandoval1, John Van Dyke1, Prateek Malhotra1, Rohan Dharmakumar1, and Behzad Sharif1,2

1Biomedical Imaging Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA, United States, 2UCLA David Geffen School of Medicine, Los Angeles, CA, United States

Synopsis

It can be argued that the most significant technical impediment for wider clinical adoption of fully-quantitative cardiac perfusion MRI is the lack of a fully-automatic post-processing workflow across all scanner platforms. In this work, we present an initial proof-of-concept based on a deep-learning approach for quantification of myocardial blood flow that eliminates the need for motion correction, hence enabling a rapid and platform-independent post-processing framework. This is achieved by optimizing/training a cascade of deep convolutional neural networks to learn the common spatio-temporal features in a dynamic perfusion image series and use it to jointly detect the myocardial contours across all dynamic frames in the dataset.

Background & Motivation

With recent technical advances, fully-quantitative perfusion cardiac MR (CMR) imaging is being adopted as a potentially superior modality for detection of myocardial ischemia by providing a validated quantitative tool for assessment of presence/severity of ischemic heart disease. It can be argued that the most significant technical impediment for wider clinical adoption of fully-quantitative perfusion CMR is the lack of a robust and rapid fully-automatic post-processing workflow across all scanner platforms. On select platforms, retrospective motion correction with non-rigid registration is available which enables a faster workflow for manual analysis – although the accuracy of motion-correction varies significantly depending on the registration technique. A recent work has proposed a optimized approach that automatically generates a pixel-wise myocardial blood flow (MBF) map.1 This approach currently requires a manual step (segmentation of the MBF pixel map) to generate, e.g., a global stress MBF value for each myocardial slice and requires using a customized pulse sequence. Deep convolutional neural networks (CNNs) have been recently applied for segmentation of cine CMR images with the goal of automatic assessment of cardiac function.3,4 In this work, we present the first attempt at applying deep learning for rapid automatic analysis of perfusion CMR datasets.

Purpose

We present an initial proof-of-concept based on a deep-learning approach for quantification of MBF that eliminates the need for motion correction, hence enabling a rapid and platform-independent post-processing framework. This is achieved by optimizing/training a cascade of CNNs to learn the common spatio-temporal features in a dynamic perfusion image series and use it to jointly detect the myocardial contours across all dynamic frames in the dataset.

Methods

Stress/rest perfusion images from 62 volunteer patients with suspected/known ischemia and 10 healthy volunteers were analyzed. All subjects underwent free-breathing vasodilator-stress CMR (saturation-recovery FLASH at 3T; contrast dose: 0.05 mmol/kg) with images acquired in 3 short-axis over 60 heartbeats. Mean MBF for each slice was quantified by an expert physicist using manual segmentation of the myocardium (endocardial/epicardial contours) for each of the slices and Fermi deconvolution of the gadolinium concentration time-curves. As shown in Fig. 1, the proposed deep-learning network is composed of a cascade of two CNNs each with an optimized U-net architecture.2 The first CNN acts as the "heart localizer" by detecting the centroid of left ventricle (LV) and, as shown in Fig. 2, is then used to crop all of the image frames to a "LV region-of-interest" as the input to the second CNN. The second CNN jointly processes the 3D stack (2D + time) of image frames for each slice and jointly detects the myocardial borders for all of the first-pass perfusion frames by computing a deep cascade of feature maps from the spatio-temporal information in the dynamic image series (gray boxes represent multi-channel feature maps in the optimized U-net architecture). Each of the two CNNs was separately trained/validated using 70 of the available 72 stress/rest perfusion studies (≈ 24,000 images). As described in Fig. 3, the training dataset was augmented by applying random affine transforms to the training dataset. Figure 4 shows an example of an augmented image series in the training dataset. For two patients (not among the training dataset), the agreement between automatic vs. manual segmentation was assessed (Dice score) and mean per-slice MBF for the two approaches (3 stress and 3 rest MBF quantified for each patient) were compared using Pearson correlation.

Results

Figure 5 shows CNN-based automatic vs. expert manual segmentation results for a representative patient demonstrating accurate endocardial/epicardial contouring across different contrast enhancement phases during free breathing (CNN computation time ≈ 0.1 sec/patient). Panel (a) shows the output of the first CNN (mean-squared error of the predicted centroid ≤ 5 pixels in all slices). As shown in panel (b), the optimized CNN cascade generated accurate segmentation of the myocardial borders across different contrast enhancement phases during free breathing without the need for respiratory motion correction. Comparison of segmentation results showed good agreement between automatic vs. manual approaches (Dice score for LV myocardium: 0.80). Pearson correlation analysis comparing the MBF quantification results showed a strong correlation between fully-automatic processing using the optimized CNN cascade vs. expert manual processing (r = 0.98, p < 0.001).

Conclusion

By leveraging the power of deep neural networks for learning the common spatio-temporal features among thousands of CMR perfusion images, the presented results demonstrate — for the first time — the potential of an optimized architecture of CNNs for automatic MBF quantification in free-breathing perfusion CMR with strong agreement compared to expert manual processing. Future work involves refinement of the CNN architecture by utilizing anatomical constraints (e.g., from the cine images) to enable segment-based quantification of MBF.

Acknowledgements

We acknowledge an equipment donation in form of an "academic GPU grant" from NVIDIA Corp.

References

1. Kellman P et al. Myocardial perfusion cardiovascular magnetic resonance: optimized dual sequence and reconstruction for quantification. J Cardiovasc Magn Reson 2017;19:43. doi: 10.1186/s12968-017-0355-5

2. Ronneberger O et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science 2015;9351. doi: 10.1007/978-3-319-24574-4_28

3. Avendi MR et al. Magn Reson Med 2017;78(6):2439-2448. doi: 10.1002/mrm.26631

4. Bai W et al. J Cardiovasc Magn Reson 2018;20:65. doi: 10.1186/s12968-018-0471-x

Figures

Figure 1. Schematic representation of the deep-learning architecture for joint spatio-temporal processing of a free-breathing first-pass perfusion image series to detect myocardial contours across all of the contrast enhancement phases without the need for correction of respiratory motion. The network consists of a cascade of two CNNs (U-net architecture). The first CNN acts as the localizer of the left ventricle (LV) by detecting the LV centroid (CNN-based regression). The second CNN jointly processes the 3D stack (2D + time) of dynamic frames for each slice and jointly detects the myocardial borders for all of the first-pass perfusion images.

Figure 2. Pictorial description of CNN #1 (heart localizer in Fig. 1), which is trained to predict (CNN regression) the centroid of the left ventricle (LV) frame by frame. To make CNN #2 easier to train, images are cropped to a region-of-interest, which is created by cropping a fixed number of pixels around the LV centroid detected by CNN #1. These cropped frames are then stacked to create the "2D+time" image array that gets fed into CNN #2 (see Fig. 1).

Figure 3. Schematic description of the training data-augmentation approach using affine transforms (random rotation/translation) applied to the original training datasets. The data was augmented in two steps (randomly generated small-angle rotation followed by random shifts of the centroid) to help fight "overfitting" during CNN-training and also to help CNN #2 (see Fig. 1) learn how to deal with imperfect LV localization.

Figure 4. Example movie of an augmented image series in the training dataset generated according to the method described in Figure 3.

Figure 5. Segmentation results using the optimized CNN cascade for a representative patient at four different time points in the first-pass perfusion image series. (a): Output of the first CNN; (b): Output of the second CNN (green mask) super-imposed on the manual segmentation performed by an expert (orange mask). The optimized CNN cascade generated accurate segmentation of the myocardial borders across different contrast enhancement phases during free breathing without the need for respiratory motion correction (CNN computation time per patient ≈ 100 milliseconds).

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)
1230