Zihao He^{1,2}, Anthony G. Christodoulou^{1,3}, Hua Guo^{2}, and Debiao Li^{1,4}

Calculation of the ejection fraction from cardiac cine MR images requires segmenting multiple images of the left ventricle. This process, which is often performed manually, is time-consuming and observer-dependent. In this work, an unsupervised machine learning algorithm, combining hidden Markov random field and optical flow, has been proposed to perform semi-automatic tissue segmentation on T1/T2-weighted low-rank tensor images that have a built-in feature space due to low-rank factorization performed during image reconstruction. The segmentation results then allow automatic EF calculation. Demonstrated results have higher efficiency and similar accuracy compared with manual segmentation, and were stable with respect to different initializations.

Images
were acquired using low-rank tensor-based cardiovascular MR (CMR) multitasking^{6},
resulting in different T1/T2-weighted contrast images at different time points.
In this way, it provides many features to uniquely identify a specific tissue. This
imaging method was selected because the temporal subspace constraint at the
foundation of low-rank tensor CMR multitasking also provides a convenient
feature space for image segmentation, bypassing the need for feature
extraction. By assuming
the values in this feature space follow a Gaussian Mixture Model (GMM), we cluster
the results according to the maximum a posteriori (MAP) criterion. A neighborhood
potential term is incorporated to enforce the spatial and temporal continuity. Edges
are preserved by disregarding the effect from neighboring voxels on a contour^{7}.
The resulting model is a hidden Markov random field (HMRF) model^{8}
defined on a 3-D space (2 image dimensions and a time dimension depicting
cardiac motion). Then the expectation-maximization (EM) algorithm^{9}
is applied to solve the following optimization function.

$$\hat{\bf{Y}} =\arg\underset{\bf{Y}}{\min}\,\left[U(\bf{X}|\bf{Y} ;\Theta)+U(\bf{Y})\right]$$

where **Y** is the configuration of all
clusters (i.e., tissues) that each voxel belongs to, **X** denotes the data in the
feature space and **Θ** denotes the
distribution parameter set (which is time-variant due to blood flow effects).
The two potential terms represent neighborhood potential and probability
potential respectively in the following form:

$$U(\bf{X}|\bf{Y};\Theta)=-\sum_{i} \log{P(\bf{x}_i|y_i;\Theta)}$$

$$U(\bf{Y})=\sum_{c \in C} V_c (\bf{Y})$$

where
**P** is the Gaussian distribution
probability, **C** denotes all the
neighboring pairs and **V _{c}**(

$$\left[\begin{matrix}\bf{I}_x^T\bf{I}_x & \bf{I}_x^T\bf{I}_y \\\bf{I}_x^T\bf{I}_y & \bf{I}_y^T\bf{I}_y\end{matrix}\right]\left[\begin{matrix}v_x \\v_y\end{matrix}\right]=-\left[\begin{matrix}\bf{I}_x^T\bf{I}_t \\\bf{I}_y^T\bf{I}_t\end{matrix}\right]$$

where **I**_{x},
**I**_{y} and **I**_{t} denote the corresponding
derivative vectors.

[1] Steven C. Mitchell, et al. Multistage hybrid active appearance model matching: Segmentation of left and right ventricles in cardiac MR images. IEEE Transactions on Medical Imaging, pages 415–423, 2001.

[2] Honghai Zhang, et al. 4-D cardiac MR image analysis: Left and right ventricular morphology and function. IEEE Transactions on Medical Imaging, pages 350–364, 2010.

[3] J. Lötjönen, et al. Statistical shape model of atria, ventricles and epicardium from short- and long-axis MR images. MICCAI, pages 371–386, 2003.

[4] Zhi-Pei Liang. Spatiotemporal imaging with partially separable functions. IEEE-ISBI, pages 988–991, 2007.

[5] Anthony G. Christodoulou, et al. A general low-rank tensor framework for high-dimensional cardiac imaging: Application to time-resolved T1 mapping. Proceedings of the International Society for Magnetic Resonance in Medicine, page 867, 2016.

[6] Anthony G. Christodoulou, et al. Non-ECG, free-breathing joint myocardial T1-T2 mapping using CMR multitasking. Proceedings of the Society for Cardiovascular Magnetic Resonance, 2017.

[7] John Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 679–698, 1986.

[8] Yongyue Zhang, et al. Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm. IEEE Transactions on Medical Imaging, pages 45–57, 2001.

[9] A. P. Dempster, et al. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B, pages 1–38, 1977.

[10] M. Proesmans, et al. Determination of optical flow and its discontinuities using non-linear diffusion. Proceedings of European Conference on Computer Vision II, pages 295–304, 1994.

[11] Thorvald Julius Sørensen. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Kongelige Danske videnskabernes selskab, pages 1–34, 1948.

Table 1 Overall test results for four datasets with 5 different initializations each. On the bottom left shows the correlation plot. The Pearson r-value squared is 0.934 which shows a high linear dependence of two outcomes from different methods. On the bottom right shows the Bland-Altman comparison. The maximum absolute value of differences in the limits is 3.5%, which shows a high consistency.

Fig.1 Pipeline of the proposed method. One cardiac cycle in each dataset contains 15 frames. The region of interest is cropped manually and several voxels are identified in each tissue of interest from one image at one time point to initialize the tissue clusters. The segmentation algorithm then converges in approximately 1 minute.

Fig.2 On the left, a typical segmentation result of a dataset is demonstrated in odd rows. Different colors represent different tissues. For example, the LV blood pool (used for EF calculation) is dark blue. The manual gold standard of LV is compared in even rows. On the right top is the DICE^{11} accuracy over the course of the cardiac cycle. In the bottom, the blue curve depicts the automatically calculated LV volume, while the red curve depicts the gold standard volume that was segmented manually. The EF calculated by the proposed method was 51.7%; the gold standard EF was 51.4%.

Fig.3 (Left) The trend of calculated EF and (right) average DICE accuracy across all frames, both as a function of the number of T1/T2-weighted features used per frame, all for subject 4. Some features contribute more information than others. In general, both metrics rise quickly for the first few features. The figure suggests accurate results may be achievable with even fewer features.