4096

Accelerating cardiac dynamic imaging with video prediction using deep predictive coding networks
Xue Feng1 and Craig H Meyer1

1Biomedical Engineering, University of Virginia, Charlottesville, VA, United States

Synopsis

Deep predictive coding networks based on stacked recurrent convolutional neural network have shown great success in video prediction since they can learn to recognize and analyze the motion patterns of each element from previous frames. In this study we adopted this model to predict future frames in cardiac cine images and used a k-space substitution method to improve the prediction accuracy. It showed promises in accelerating cardiac dynamic imaging.

Introduction

With recent developments in deep convolutional neural networks, future frames in a video sequence can be predicted well using predictive networks 1,2 since they can learn to recognize and analyze the motion patterns of each element from previous frames. The deep predictive coding networks proposed in 2 (PredNet) use a stacked recurrent convolutional neural network (RCNN) to predict future frames and showed state-of-the-art performance compared with other network structures. In this study, we aim to adopt PredNet to predict future frames in cardiac cine images and explore the combination with k-space sampling to further improve the prediction accuracy.

Methods

The network structure is similar to that described in Figure 1 of 2 with 4 stacked RCNN layers. The convolutional LSTM block in each layer is used to store information from previous frames. The numbers of features in each layer are 1, 48, 96 and 192 and the image size is reduced by 2x2 for each layer on top. The training and validation data set is from the Kaggle Second Annual Data Science Bowl 3 containing 2D short-axis images covering LV from 700 subjects. Each subject contains 8-12 slices with 30 cardiac phases for each slice. 500 subjects were used for training and 200 were used for validation. The model input is a dynamic image series containing 8 consecutive frames randomly extracted from the entire cardiac cycle and the output is the predicted next frames given all previous ones (e.g. predict frame 6 from frame 1-5). Root mean square error between the ground truth frame and the predicted frame is used as the loss function during training. During validation, all 30 cardiac phases are fed into the model to generate predicted next frames. To evaluate the model, the predicted next frames are compared with the corresponding ground truth frames and the previous frames, since a straightforward prediction method is just to copy the previous frame. To study the effect of using under-sampled k-space data, a simulation study was performed by transforming the ground truth frame and the predicted frame to k-space and substituting the k-space lines from the predicted frame with the corresponding lines from the ground truth frame according to the sampling pattern, which is 4x uniform undersampling in this experiment. The updated prediction is then made by an inverse FFT of the resulting k-space data. As a matter of fact, this operation, when applied to the simple prediction method by copying a previous frame, is the view sharing reconstruction, since the un-acquired k-space data is directly copied from a previous frame.

Results

Figure 1 shows the results focusing on frame 10 at systole. The top row shows ground truth frame 10, frame 9, view sharing reconstruction, predicted frame 10 and the updated prediction using the k-space substitution method. The bottom row shows the differences with the ground truth frame. During systole, when cardiac motion is the fastest, there are significant changes around the myocardium between frame 9 and frame 10. The view sharing recon updates the cardiac shape but suffers from aliasing. With just model prediction, the mismatch in LV myocardium is greatly reduced, because the model learned the movement patterns of myocardium and predicted its deformation from the previous frames. With k-space substitution, the differences with ground truth are further minimized.

Discussions and Conclusion

Through training from a large number of cardiac cine images, the PredNet model is able to learn the movement patterns on a pixel-by-pixel level to predict the next frame in the image series from previous frames. This prediction can be improved with k-space sampling by simply replacing the acquired k-space data. However, there are several limitations in this study. Due to the lack of original k-space data, the sampling step assumes the phase is zero in all images. In practice, if only the magnitude image series are available, one solution is to separate the real and imaginary parts of the image series and run the trained model on each part and combine the results before k-space substitution. Furthermore, as the training and validation data has minimal breathing artifacts, it is unclear whether the model can track global heart motions during breathing. However, given the success of PredNet in predicting the movements of real-life objects, as the breathing movement pattern is much simpler than myocardium deformation, it is likely to learn the breathing motion with only a handful of training data. In conclusion, PredNet is a promising tool for fast and reliable accelerated dynamic imaging.

Acknowledgements

No acknowledgement found.

References

1. Mathiu M, Couprie C, Lecun Y. Deep multi-scale video prediction beyond mean square error. arXiv:1511.05440 [cs.LG]. 2015.

2. Lotter W, Kreiman G, Cox D. Deep predictive coding networks for video prediction and unsupervised learning. arXiv:1605.08104 [cs.LG]. 2016.

3. https://www.kaggle.com/c/second-annual-data-science-bowl

Figures

Figure 1. The top row shows ground truth frame 10, frame 9, view sharing recon, predicted frame 10 and the updated prediction and the bottom row shows the differences with the ground truth frame. There are significant changes around the myocardium between frame 9 and frame 10. The view sharing recon updates the cardiac shape but suffers from aliasing. With just model prediction, the mismatch in LV myocardium is greatly reduced. Sampling further reduces the differences with ground truth.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)
4096