0453

Voxel-wise Tracking of Grid Tagged Cardiac Images using a Neural Network Trained with Synthetic Data

Michael Loecher^1,2, Luigi E Perotti³, and Daniel B Ennis^1,2,4,5
¹Radiology, Stanford University, Stanford, CA, United States, ²Radiology, Veterans Affairs Health Care System, Palo Alto, CA, United States, ³Mechanical and Aerospace Engineering, University of Central Florida, Orlando, FL, United States, ⁴Maternal & Child Health Research Institute, Stanford University, Stanford, CA, United States, ⁵Cardiovascular Institute, Stanford University, Stanford, CA, United States

Synopsis

This work introduces a neural network for tracking myocardial motion in cine grid tagged MR images on a voxel-wise basis. This is achieved with the use a synthetic training dataset that includes comprehensive motion patterns. Synthetic training allows for a known ground truth motion to be included in training. The network was tested against a previous network that tracked only tag line intersections. Displacements and strain maps were generated and compared. The voxel tracking network shows qualitatively better spatial localization of strain, and better radial strain values compared to tracking only tag lines.

Introduction

Cine grid tagged cardiac MRI enables the measurement of cardiac displacements and the quantification of cardiac strains, which are an import biomarker of cardiac function and dysfunction. By tracking the tag lines placed on the image, displacements can be measured. However, tracking tag lines can be cumbersome, requiring manual review and intervention. Machine learning has shown to be a promising technique for motion tracking applied to this problem.^1,2

Previously, a comprehensive synthetic data generator for training a neural network to accurately track tag line intersections and compute strain has been demonstrated.² Using synthetic data is promising as it contains a known ground truth, is easy to generate and modify, and is not constrained by regulatory considerations nor biased by inclusion demographics. In this work, we expand upon this methodology to demonstrate that synthetic data training can also enable voxel-wise tracking of the myocardium. This potentially allows for greater spatial localization of displacements, and therefore strain. Our objective was to demonstrate the use of a neural net based, voxel-wise tracking technique for the effective quantification of local strains from grid tagged images.

Methods

A synthetic data generation algorithm was used to generate time-resolved deforming natural images for training with known displacement values for all points and timeframes. This data generator then uses a Bloch simulation to produce a cine grid tagged dataset with realistic image quality (Fig 1A). The images and their respective motion paths were used to train an 18-layer Resnet architecture³ with modifications to include both coordinate convolutions⁴ and (2+1)D convolutions⁵, therefore enabling convolutions in both the spatial and temporal dimensions. The network tracks a single point from a surrounding 32x32 voxel patch. Each individual point to be tracked is processed by the network to extract its specific motion path (Fig 1B). 10^6 training patches were used to train the network, and inference takes ~1 second to track all voxels in a time-resolved series.

The methods were applied in short-axis, mid-LV slices in healthy pediatric subjects (N=9) with IRB approval and consent. Cine grid tagged images were acquired with: TE/TR=2.5ms/4.9ms, flip angle=10°, FOV=260mmx320mm, 110° total tagging flip angle, spatial resolution 1.4mmx1.4mmx8mm, 25 time frames, 8-12s breath hold. The LV was manually segmented in the first time frame and all voxels in the mask were used for subsequent automated tracking (Fig 1C). Additionally, the tag line intersections were manually delineated and tracked with a separate network trained only on intersection tracking for comparison (Fig 1D). This approach has been well validated for accurately tracking tag line intersections, as well as computing strain.²

Tracked points were visualized with displacement vectors, as well as displacement maps, which were generated by linearly interpolating the tracked points onto an imaging grid. Strains were calculated by differentiating the displacement field interpolated using a radial basis function with a Gaussian kernel and shape parameter = 1.4mm (1 pixel) for voxel tracking and 8mm (equivalent to tag spacing) for tag line intersection tracked images.⁶ Circumferential (E_cc) and radial (E_rr) strains were investigated as maps, as well as the mean global strain values across the LV ROI.

Results

Figure 2 shows an animation of the tracking output from the neural network, shows tracked points, and the corresponding displacement vectors relative to the reference configuration (t=0). Figure 3 shows an animation of the displacement values interpolated onto an image grid. When the tracking the same locations (tag intersections), the two networks differed by RMSE = 0.29mm. Figure 4 shows an example case comparing intersection tracking to voxel-wise tracking. Fig. 4A shows E_cc maps, where similar patterns and values are seen, with qualitatively less blurring in the voxel tracked maps. Fig. 4B shows a global E_cc curve from the same case, where similar values are seen. Fig. 4C shows E_rr images from both techniques, and Fig 4D shows the corresponding global E_rr curves from both methods. In this example, E_rr is higher with the voxel-wise tracking as compared to intersection tracking. Figure 5A and 5B show strain differences across all subjects. Fig. 5C and 5D show box plots of peak global E_cc and E_rr for all subjects. In both instances good agreement is seen for E_cc, and voxel-wise tracking reports higher E_rr values.

Discussion

In this work, we use synthetically generated data to train a neural network for tracking every voxel in the myocardium of grid-tagged images. The use of synthetic data enables the direct and quantitative comparison of alternative strategies (tracking voxels versus intersections) since the ground truth motion is known at all points in the training data, which is otherwise difficult or impossible to obtain with real data. Compared to tracking tag line intersections, voxel-wise tracking allows for better spatial localization of the displacement and strain values. This is seen qualitatively in the strain maps and in the radial strain values. Tracking intersections results in reduced E_rr, most likely because there are not enough points in the radial direction to estimate it properly. Notably, the use of voxel-wise tracking gives more reasonable E_rr values for healthy subjects (20%-40%).⁷ Further work is needed to compare to other tracking methods, as well as in a larger patient cohort with cardiac dysfunction.

Acknowledgements

R01 HL131823, R01 HL131975, R01 HL152256

References

[1] Ferdian, Edward, et al. "Fully Automated Myocardial Strain Estimation from Cardiovascular MRI–tagged Images Using a Deep Learning Framework in the UK Biobank." Radiology: Cardiothoracic Imaging 2.1 (2020): e190032.

[2] Loecher M, Perotti LE , Ennis DB. Cardiac Tag Tracking with Deep Learning Trained with Comprehensive Synthetic Data Generation. ISMRM 28th Annual Meeting, Paris, France (Virtual Conference), 2020

[3] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[4] Liu, Rosanne, et al. "An intriguing failing of convolutional neural networks and the coordconv solution." Advances in neural information processing systems 31 (2018): 9605-9616.

[5] Tran, Du, et al. "A closer look at spatiotemporal convolutions for action recognition." Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018.

[6] Bistoquet, Arnaud, John Oshinski, and Oskar Škrinjar. "Myocardial deformation recovery from cine MRI using a nearly incompressible biventricular model." Medical image analysis 12.1 (2008): 69-85.

[7] Scatteia, A., A. Baritussio, and C. Bucciarelli-Ducci. "Strain imaging using cardiac magnetic resonance." Heart failure reviews 22.4 (2017): 465-476.

Figures

Figure 1: A) An overview of the synthetic data generation and the output image patches used for training. B) The Resnet architecture starts with image patches as input and outputs the 2x25 motion displacements (x,y per timeframe). C) and D) show the points tracked for the voxel-wise and intersection techniques, respectively.

Figure 2: Animation of the tracking network output. The left panel shows a cropped image of the input grid tagged data. This middle panel shows the tracked points overlayed on the image throughout cardiac cycle. The right panel shows the displacement vectors of the tracked points (only 25% of points included for visibility).

Figure 3: Animation of displacement maps generated from the tracked data using the neural network. x displacement is shown on the left, and y displacement on the right.

Figure 4: A) E_cc maps for both tracking methods, where similar values can be seen, with less blurring on the voxel tracked map. B) Corresponding E_cc curve from (A), where very similar values are seen. C) E_rr maps from both methods, where voxel tracking corresponds to higher values and less blurring. D) E_rr curves for this case, where higher E_rr is evident.

Figure 5: A) Mean and standard deviation (continuous line and shaded region) of the difference in E_cc between voxel and intersection tracking for all cases. B) Similar comparison for E_rr. C) and D) are boxplots of peak global E_cc and E_rr respectively.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

0453